Part Number Hot Search : 
15800 2SB1381 09012 LA5779MP 12F10M SG564420 CAT52407 LM7915CT
Product Description
Full Text Search
 

To Download TX79COREARCHITECTURE Datasheet File

  If you can't view the Datasheet, Please click here to try to view without PDF Reader .  
 
 


  Datasheet File OCR Text:
 TX System RISC TX79 Core Architecture (Symmetric 2-way superscalar 64-bit CPU) Rev. 2.0
The information contained herein is subject to change without notice. The information contained herein is presented only as a guide for the applications of our products. No responsibility is assumed by TOSHIBA for any infringements of patents or other rights of the third parties which may result from its use. No license is granted by implication or otherwise under any patent or patent rights of TOSHIBA or others. TOSHIBA is continually working to improve the quality and reliability of its products. Nevertheless, semiconductor devices in general can malfunction or fail due to their inherent electrical sensitivity and vulnerability to physical stress. It is the responsibility of the buyer, when utilizing TOSHIBA products, to comply with the standards of safety in making a safe design for the entire system, and to avoid situations in which a malfunction or failure of such TOSHIBA products could cause loss of human life, bodily injury or damage to property. In developing your designs, please ensure that TOSHIBA products are used within specified operating ranges as set forth in the most recent TOSHIBA products specifications. Also, please keep in mind the precautions and conditions set forth in the "Handling Guide for Semiconductor Devices," or "TOSHIBA Semiconductor Reliability Handbook" etc.. The Toshiba products listed in this document are intended for usage in general electronics applications ( computer, personal equipment, office equipment, measuring equipment, industrial robotics, domestic appliances, etc.). These Toshiba products are neither intended nor warranted for usage in equipment that requires extraordinarily high quality and/or reliability or a malfunction or failure of which may cause loss of human life or bodily injury ("Unintended Usage"). Unintended Usage include atomic energy control instruments, airplane or spaceship instruments, transportation instruments, traffic signal instruments, combustion control instruments, medical instruments, all types of safety devices, etc.. Unintended Usage of Toshiba products listed in this document shall be made at the customer's own risk. The products described in this document may include products subject to the foreign exchange and foreign trade laws.
(c) 2001 TOSHIBA CORPORATION All Rights Reserved
Preface
Thank you for choosing Toshiba semiconductor products. This is the year 2000 edition of the user's manual for the architecture of the TX79 RISC microprocessor core, a member of the TX System RISC Family of Toshiba microprocessors.
This user's manual is designed to be easily understood by engineers who are designing a Toshiba microprocessor into their products for the first time. No special knowledge of this architecture is assumed - the contents includes basic information about the architecture of the TX79 microprocessor core as well as more advanced, in-depth description.
Toshiba are continually updating technical publications. Any comments and suggestions regarding any Toshiba document are most welcome and will be taken into account when subsequent editions are prepared. To receive updates to the information in this manual, or for additional information about this architecture, please contact your nearest Toshiba office or authorized Toshiba dealer.
April 2001
Contents
CONTENTS
Handling Precautions C790 User's Manual
1. Introduction ...................................................................................................................................1-1 1.1 1.2 1.3 1.4 1.5 2. Features....................................................................................................................................1-2 Related Documents ..................................................................................................................1-3 Revision History........................................................................................................................1-4 Conventions Used in This Manual ...........................................................................................1-5 Restrictions for Use of the C790 CPU Core.............................................................................1-6
Architecture Overview..................................................................................................................2-1 2.1 Block Diagram and Functional Block Descriptions ..................................................................2-2 PC Unit ..............................................................................................................................2-3 MMU ..................................................................................................................................2-3 Caches...............................................................................................................................2-3 Issue Logic and Staging Registers....................................................................................2-3 GPR (General Purpose Registers) and FPR (Floating-Point Registers)..........................2-3 The Five Execution Pipes..................................................................................................2-3 I0 and I1 Pipes ............................................................................................................2-3 LS - Load/Store Pipe...................................................................................................2-3 BR - Branch Pipe ........................................................................................................2-3 C1 - COP1/FPU Pipe ..................................................................................................2-3
2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6
2.1.6.1 2.1.6.2 2.1.6.3 2.1.6.4 2.1.7 2.1.8 2.1.9
Operand/Bypass logic .......................................................................................................2-4 Response Buffer and Writeback Buffer .............................................................................2-4 UCAB.................................................................................................................................2-4
2.1.10 Result and Move Buses ....................................................................................................2-4 2.1.11 Bus Interface Unit and BIU Bus.........................................................................................2-4 2.2 Superscalar Pipeline Operation ...............................................................................................2-5 Integer Instruction Pipeline Stages ...................................................................................2-5 C1 (COP1/FPU) Instruction Pipeline Stages ....................................................................2-8 Classification and Routing of Instructions According to Execution Pipelines .................2-10 Instruction Issue Combinations .......................................................................................2-12 CPU Registers.................................................................................................................2-14 FPU Registers .................................................................................................................2-14 COP0 Registers...............................................................................................................2-15 2.2.1 2.2.2 2.2.3 2.2.4 2.3 2.3.1 2.3.2 2.3.3
Registers.................................................................................................................................2-14
i
Contents
2.4 2.5 2.6 2.7 2.8 2.9 3. Memory Management ............................................................................................................2-16 Cache Memory .......................................................................................................................2-17 Bus Interface ..........................................................................................................................2-18 Floating Point Unit ..................................................................................................................2-18 Performance Counter .............................................................................................................2-19 Debug and Tracing Functions ................................................................................................2-19
Instruction Set Overview and Summary.....................................................................................3-1 3.1 3.2 3.3 Introduction ...............................................................................................................................3-2 CPU Instruction Set Formats....................................................................................................3-3 Instruction Set Summary ..........................................................................................................3-4 Load/Store Instructions .....................................................................................................3-4 Normal Loads and Stores ...........................................................................................3-4 Multimedia Loads and Stores .....................................................................................3-5 Coprocessor Loads and Stores ..................................................................................3-5 Data Formats and Addressing ....................................................................................3-5 Defining Access Types................................................................................................3-9 Scheduling a Load Delay Slot...................................................................................3-13 ALU Immediate Instructions......................................................................................3-14 Three Operand Register-Type Instructions ..............................................................3-15 Shift Instructions .......................................................................................................3-15 Multiply and Divide Instructions ................................................................................3-15 64-Bit Operations ......................................................................................................3-15 Jump Instructions......................................................................................................3-16 Branch Instructions ...................................................................................................3-17 Exception Instructions...............................................................................................3-18 Serialization Instructions ...........................................................................................3-18 MIPS IV Instructions .................................................................................................3-19 3.3.1.1 3.3.1.2 3.3.1.3 3.3.1.4 3.3.1.5 3.3.1.6 3.3.2 3.3.2.1 3.3.2.2 3.3.2.3 3.3.2.4 3.3.2.5 3.3.3 3.3.3.1 3.3.3.2 3.3.4 3.3.4.1 3.3.4.2 3.3.4.3 3.3.5 3.3.6 3.3.7
3.3.1
Computational Instructions..............................................................................................3-14
Jump and Branch Instructions.........................................................................................3-16
Miscellaneous Instructions ..............................................................................................3-18
System Control Coprocessor (COP0) Instructions .........................................................3-20 Coprocessor 1 (COP1)....................................................................................................3-21 Coprocessor 1 (COP1) Instructions..........................................................................3-21 Integer Multiply / Divide Instructions .........................................................................3-22 Multimedia Instructions .............................................................................................3-23 C790-Specific Instructions...............................................................................................3-22
3.3.6.1 3.3.7.1 3.3.7.2 3.4 4.
User Instruction Latency and Repeat Rate ............................................................................3-25
CPU and COP0 Registers.............................................................................................................4-1 4.1 CPU Registers..........................................................................................................................4-2
ii
Contents
4.1.1 4.1.2 4.1.3 4.1.4 4.2 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8 4.2.9 General Purpose Registers ...............................................................................................4-4 HI and LO Registers ..........................................................................................................4-4 Shift Amount (SA) Register ...............................................................................................4-4 Program Counter (PC) ......................................................................................................4-4 Index Register (0) ..............................................................................................................4-6 Random Register (1) .........................................................................................................4-7 EntryLo0 Register (2), and EntryLo1 Register (3).............................................................4-8 Context Register (4) ..........................................................................................................4-9 PageMask Register (5)....................................................................................................4-10 Wired Register (6) ........................................................................................................... 4-11 BadVAddr Register (8).....................................................................................................4-12 Count Register (9) ...........................................................................................................4-13 EntryHi Register (10).......................................................................................................4-14
System Control Coprocessor (COP0) Registers......................................................................4-5
4.2.10 Compare Register (11) ....................................................................................................4-15 4.2.11 Status Register (12).........................................................................................................4-16 4.2.11.1 Status Register Format .............................................................................................4-17 4.2.11.2 Status Register Modes and Access States ..............................................................4-18 4.2.12 Cause Register (13) ........................................................................................................4-19 4.2.13 EPC Register (14) ...........................................................................................................4-21 4.2.14 PRId Register (15)...........................................................................................................4-22 4.2.15 Config Register (16) ........................................................................................................4-23 4.2.16 BadPAddr Register (23)...................................................................................................4-25 4.2.17 Debug Registers (24) ......................................................................................................4-26 4.2.18 Performance Counter Registers (25) ..............................................................................4-28 4.2.19 TagLo (28) and TagHi (29) Registers ..............................................................................4-31 4.2.20 ErrorEPC (30)..................................................................................................................4-33 5. Exception Processing and Reset ................................................................................................5-1 5.1 The Exception Handling Process .............................................................................................5-2 Level 1 Exceptions ............................................................................................................5-2 Level 2 Exceptions ............................................................................................................5-5
5.1.1 5.1.2 5.2 5.3 5.4 5.5
Exception Vector Locations ......................................................................................................5-7 Cause Register Setting ............................................................................................................5-8 Masking an exception...............................................................................................................5-9 Detaild Description .................................................................................................................5-10 Exception Priority.............................................................................................................5-10 Reset Exception .............................................................................................................. 5-11 Non-Maskable Interrupt (NMI) Exception........................................................................5-12 Performance Counter Exception .....................................................................................5-13
5.5.1 5.5.2 5.5.3 5.5.4
iii
Contents
5.5.5 5.5.6 5.5.7 5.5.8 5.5.9 Debug Exception .............................................................................................................5-14 Address Error Exception .................................................................................................5-15 TLB Refill Exception ........................................................................................................5-16 TLB Invalid Exception......................................................................................................5-17 TLB Modified Exception ..................................................................................................5-18
5.5.10 Bus Error Exception.........................................................................................................5-19 5.5.11 System Call Exception.....................................................................................................5-20 5.5.12 BREAK Instruction Exception..........................................................................................5-21 5.5.13 Reserved Instruction Exception.......................................................................................5-22 5.5.14 Coprocessor Unusable Exception...................................................................................5-23 5.5.15 Interrupt Exception ..........................................................................................................5-24 5.5.16 SIO Exception..................................................................................................................5-25 5.5.17 Integer Overflow Exception .............................................................................................5-26 5.5.18 Trap Exception.................................................................................................................5-27 5.5.19 Floating-Point Exception .................................................................................................5-28 6. Memory Management ...................................................................................................................6-1 6.1 Translation Look-aside Buffer (TLB) ........................................................................................6-2 Translation Status..............................................................................................................6-2 Multiple Matches................................................................................................................6-2 Virtual Address Space .......................................................................................................6-3 Physical Address Space....................................................................................................6-4 Virtual-to-Physical Address Translation ............................................................................6-4 32-bit Address Translation Mode ......................................................................................6-5 Operating Modes ...............................................................................................................6-6 User Mode Operations ......................................................................................................6-8 Supervisor Mode Operations...........................................................................................6-10 Kernel Mode Operations ................................................................................................. 6-11 Format of a TLB Entry .....................................................................................................6-15
6.1.1 6.1.2 6.2 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5 6.2.6 6.2.7 6.2.8 6.3 6.4 6.5 7. 6.3.1
Address Spaces .......................................................................................................................6-3
System Control Coprocessor .................................................................................................6-14 Virtual-to-Physical Address Translation Process ...................................................................6-18 TLB Instructions......................................................................................................................6-20
Caches 7-1 7.1 7.2 Cache Features ........................................................................................................................7-2 Organization of the Caches......................................................................................................7-3 Data Cache........................................................................................................................7-3 Instruction Cache...............................................................................................................7-4 Tag Structure .....................................................................................................................7-5
7.2.1 7.2.2 7.2.3
iv
Contents
7.2.3.1 7.2.3.2 7.2.4 7.3 7.3.1 7.3.2 7.3.3 7.3.4 7.3.5 7.3.6 7.3.7 7.3.8 7.4 7.4.1 7.4.2 7.4.3 7.5 7.6 8. Data Cache Tag Structure ..........................................................................................7-6 Instruction Cache Tag Structure .................................................................................7-6
State of Cache Tags After Reset.......................................................................................7-7 Line Replacement Algorithm .............................................................................................7-8 Non-blocking Loads and Hit Under Miss...........................................................................7-8 Cache Miss and Hit Operations ........................................................................................7-9 Data Cache Writeback Policy..........................................................................................7-10 Data Cache State Transitions ......................................................................................... 7-11 Instruction Cache State Transitions ................................................................................7-12 Data Cache Lock Function ..............................................................................................7-12 Operations During Lock ............................................................................................7-13 Relationship Between Cached and Uncached Operations.............................................7-13 UCAB Configuration ........................................................................................................7-14 Tag Structure ...................................................................................................................7-14 Non-blocking Loads and HiT under Miss ........................................................................7-14
Cache Operations.....................................................................................................................7-8
7.3.7.1
Uncached Accelerated Buffer.................................................................................................7-14
Cache Control Registers ........................................................................................................7-15 CACHE Instruction .................................................................................................................7-16
CPU Bus .........................................................................................................................................8-1 8.1 Introduction ...............................................................................................................................8-2 Terminology .......................................................................................................................8-3 Signal Naming Convention................................................................................................8-3 CPU Bus Connectivity for Address and Control Paths .....................................................8-5 CPU Bus Connectivity for Data Paths...............................................................................8-6 Address Bus Signals .........................................................................................................8-7 CPU Bus Operations .......................................................................................................8-12 Processor Requests ........................................................................................................8-12 Read Requests .........................................................................................................8-12 Write Requests..........................................................................................................8-13
8.1.1 8.1.2 8.2 8.2.1 8.2.2 8.3 8.4 8.3.1 8.4.1 8.4.2
CPU Bus Architecture ..............................................................................................................8-4
CPU Bus Signal Descriptions...................................................................................................8-7 Overview of CPU Bus Operations..........................................................................................8-12
8.4.2.1 8.4.2.2 8.4.3 8.5 8.5.1 8.5.2
Bus Error Operations.......................................................................................................8-13 Arbitration Operations .....................................................................................................8-14 Cycle Stealing ...........................................................................................................8-15 CPU Single Reads ....................................................................................................8-16 CPU Single Operations ...................................................................................................8-16
CPU Bus Transaction Protocols and Timing ..........................................................................8-14 8.5.1.1 8.5.2.1
v
Contents
8.5.2.2 8.5.2.3 8.5.3 8.5.3.1 8.5.3.2 8.5.3.3 8.5.3.4 8.5.4 8.5.4.1 8.5.4.2 8.5.5 8.5.5.1 8.5.5.2 8.5.6 8.5.6.1 8.5.6.2 8.5.6.3 8.5.6.4 8.5.6.5 9. CPU Single Writes ....................................................................................................8-17 CPU Single Read-Write-Read-Write Cycles.............................................................8-18 CPU Burst Reads......................................................................................................8-19 CPU Burst Writes ......................................................................................................8-20 CPU Burst Read-Write Cycles ..................................................................................8-21 CPU Burst Write-Read Cycles ..................................................................................8-21 CPU Non-Pipeline Single Reads ..............................................................................8-22 CPU Non-Pipeline Single Writes ..............................................................................8-23 CPU Non-Pipeline Burst Reads................................................................................8-23 CPU Non-Pipeline Burst Writes ................................................................................8-24 Bus Error Exceptions ................................................................................................8-25 CPU Bus Cycle Termination .....................................................................................8-26 Bus Error Timing with No Pending Operation...........................................................8-26 Bus Error Timing with One Pending Operation ........................................................8-26 Bus Error Timing with Two Pending Operations.......................................................8-28
CPU Burst Operations.....................................................................................................8-19
CPU Non-Pipeline Single Operations .............................................................................8-22
CPU Non-Pipeline Burst Operations ...............................................................................8-23
Bus Error Operations.......................................................................................................8-25
Performance Counter ...................................................................................................................9-1 9.1 9.2 Overview...................................................................................................................................9-2 Performance Counters and Performance Control Registers ...................................................9-2 Accessing Counters and Registers ...................................................................................9-3 State of Performance Counter Control Registers Upon Reset .........................................9-4 Counter Events..................................................................................................................9-6 Event Descriptions ......................................................................................................9-7 Handling Performance Counter Exceptions....................................................................9-10 Priority of Counter Exceptions......................................................................................... 9-11 Initializing Counters ......................................................................................................... 9-11 The Note to Read Counters ............................................................................................9-12
9.2.1 9.2.2 9.3 9.3.1 9.3.2 9.3.3 9.3.4 9.3.5
Counter Operation ....................................................................................................................9-5 9.3.1.1
10. Floating-Point Unit, CP1 (Option)..............................................................................................10-1 10.1 Overview.................................................................................................................................10-2 10.2 Floating Point Register ...........................................................................................................10-2 10.2.1 Floating-Point General Registers (FGRs) .......................................................................10-2 10.2.2 Floating-Point Registers (FPRs)......................................................................................10-4 10.2.3 Floating-Point Control Registers .....................................................................................10-4 10.2.4 Accessing the FP Control and Implementation/Revision Registers ...............................10-9 10.3 Floating-Point Formats .........................................................................................................10-10
vi
Contents
10.4 Binary Fixed-Point Format....................................................................................................10-12 10.5 Floating-Point Instruction Set Summary...............................................................................10-13 10.5.1 Load, Store and Move Instructions (Table 10-10) .........................................................10-13 10.5.2 Conversion Instructions (Table 10-11)...........................................................................10-14 10.5.3 Computational Instructions (Table 10-12) .....................................................................10-14 10.5.4 Compare and Branch Instructions (Table 10-13) ..........................................................10-15 11. Floating-Point Exception (Option) ............................................................................................ 11-1 11.1 Introduction ............................................................................................................................. 11-2 11.2 Exception Types ..................................................................................................................... 11-2 11.3 Exception Trap Processing .................................................................................................... 11-3 11.4 Flags ....................................................................................................................................... 11-3 11.5 FPU Exceptions...................................................................................................................... 11-5 11.6 Saving and Restoring State.................................................................................................... 11-9 11.7 Trap Handlers for IEEE Standard 754 Exceptions................................................................. 11-9 12. PC Trace .......................................................................................................................................12-1 12.1 Real-Time PC Tracing ............................................................................................................12-2 12.1.1 Classification of Branch and Jump Instructions ..............................................................12-2 12.1.2 PC Trace Signals.............................................................................................................12-3 12.1.3 Priority of Target Addresses ............................................................................................12-7 12.1.4 Examples of PC Tracing..................................................................................................12-8 12.1.4.1 Sequential Execution ................................................................................................12-9 12.1.4.2 Conditional Branch..................................................................................................12-10 12.1.4.3 Indirect Jump (Target in Phase A) .......................................................................... 12-11 12.1.4.4 Indirect Jump (Target in Phase B) ..........................................................................12-12 12.1.4.5 Indirect Jump (During Target PC Output) ...............................................................12-13 12.1.4.6 Exception (Target in Phase B) ................................................................................12-14 12.1.4.7 Exception (During Target PC Output) .....................................................................12-15 12.1.4.8 Exception Generated by Branch or Jump Instruction.............................................12-16 12.1.4.9 Exception Generated by Branch Delay Slot Instruction .........................................12-17 12.1.4.10 Exception Generated by Target Instruction ............................................................12-18 12.1.4.11 Back to Back Exceptions (Case I) ..........................................................................12-19 12.1.4.12 Back to Back Exceptions (Case II) .........................................................................12-20 13. Hardware Breakpoint..................................................................................................................13-1 13.1 Hardware Breakpoint..............................................................................................................13-2 13.1.1 Hardware Breakpoint signal ............................................................................................13-2 13.2 Breakpoint Registers ..............................................................................................................13-3 13.2.1 Breakpoint Control Register (BPC) .................................................................................13-4 13.2.2 Instruction Address Breakpoint Register (IAB) / Instruction Address Breakpoint Mask
vii
Contents
Register (IABM) ...............................................................................................................13-7 13.2.3 Data Address Breakpoint Register (DAB) / Data Address Breakpoint Mask Register (DABM) ............................................................................................................................13-7 13.2.4 Data Value Breakpoint Register (DVB) / Data Value Breakpoint Mask Register (DVBM)138 13.3 Setting Breakpoint ..................................................................................................................13-8 13.3.1 Sequence of Setting Breakpoint......................................................................................13-9 13.3.2 Instruction Breakpointing...............................................................................................13-14 13.3.3 Data Address Breakpointing..........................................................................................13-16 13.3.4 Breakpointing by Data Address and Value....................................................................13-18 13.3.5 Data Value Breakpointing ..............................................................................................13-19 13.4 Triggering External Probes...................................................................................................13-20 13.5 Important notice on using hardware breakpoint...................................................................13-20 A. CPU Instruction Set Details ........................................................................................................ A-1 A.1 Description of an Instruction.................................................................................................... A-2 Instruction Mnemonic and Name ..................................................................................... A-2 Instruction Encoding Picture............................................................................................. A-2 Format .............................................................................................................................. A-2 Purpose ............................................................................................................................ A-2 Description........................................................................................................................ A-2 Restrictions ....................................................................................................................... A-2 Operation .......................................................................................................................... A-2 Exceptions ........................................................................................................................ A-2 Programming Notes, Implementation Notes .................................................................... A-3 Pseudocode Language Statement Execution ........................................................... A-3 Pseudocode Symbols ................................................................................................ A-3 Coprocessor General Register Access Pseudocode Functions ............................... A-4 Load and Store Memory Pseudocode Functions ...................................................... A-6 Miscellaneous Functions............................................................................................ A-8
A.1.1 A.1.2 A.1.3 A.1.4 A.1.5 A.1.6 A.1.7 A.1.8 A.1.9 A.2
Instruction Description Notation and Functions ...................................................................... A-3 A.2.1.1 A.2.1.2
A.2.2
Definitions of Pseudocode Functions Used in Instruction Descriptions .......................... A-4
A.2.2.1 A.2.2.2 A.2.2.3 A.3 A.4 A.5
CPU Instruction Formats ......................................................................................................... A-9 Instruction Descriptions ......................................................................................................... A-10 CPU Instruction Encoding ................................................................................................... A-141
B. C790-Specific Instruction Set Details ........................................................................................ B-1 B.1 Conventions Used in This Chapter ......................................................................................... B-2 Instruction Description Notation and Functions ............................................................... B-2 Pseudocode Language Statement Execution.................................................................. B-2 Pseudocode Symbols....................................................................................................... B-2
B.1.1 B.1.2 B.1.3
viii
Contents
B.2 B.3 Definitions for Pseudocode Functions Used in Operation Descriptions ................................. B-2 Summary of C790-Specific Instructions .................................................................................. B-3 Multiply and Multiply-Add Instructions.............................................................................. B-3 Multimedia Instructions..................................................................................................... B-3
B.3.1 B.3.2 B.4 B.5
Instruction Set Details ............................................................................................................. B-6 C790-Specific Instruction Encoding .................................................................................... B-163
C. COP0 System Control Coprocessor Instruction Set Details................................................... C-1 C.1.1 Notes on the CACHE Instruction Sub-operations ............................................................ C-7
Cache Virtual Address................................................................................................................ C-7 Cache Physical Address ............................................................................................................ C-7 BTAC Virtual Address................................................................................................................. C-7 BTAC Index Bits ......................................................................................................................... C-7 COP0 Not Usable ....................................................................................................................... C-7 TLB Exceptions on Cache Operations ....................................................................................... C-8 Hit Sub-operation Accesses ....................................................................................................... C-8 Breakpoint Exception ................................................................................................................. C-8 Address Error Exception ............................................................................................................ C-8 C.1.2 C.1.3 C.2 Sub-Operation Descriptions ............................................................................................. C-9 Updates of Data Tag Status Bits .................................................................................... C-13
COP0 Instruction Encoding ................................................................................................... C-41
D. COP1 (FPU) Instruction Set Details ........................................................................................... D-1 D.1 Conventions Used in This Chapter ......................................................................................... D-2 Instruction Description Notation and Functions ............................................................... D-2 Pseudocode Language Statement Execution.................................................................. D-2 Pseudocode Symbols....................................................................................................... D-2
D.1.1 D.1.2 D.1.3 D.2 D.3 D.4
Definitions for Pseudocode Functions Used in Operation Descriptions ................................. D-2 Instruction Descriptions ........................................................................................................... D-3 COP1 Instruction Encoding ................................................................................................... D-40
ix
Figures
FIGURES
Figure 2-1. C790 Block Diagram .....................................................................................................2-2 Figure 2-2. C790 Integer Instruction Pipeline ..................................................................................2-5 Figure 2-3. FPU Pipeline..................................................................................................................2-8 Figure 2-4. Instruction Routing in Logical Pipes and Physical Pipes ............................................2-10 Figure 3-1. CPU Instruction Formats...............................................................................................3-3 Figure 3-2. Big-Endian Byte Ordering .............................................................................................3-6 Figure 3-3. Little-Endian Byte Ordering ...........................................................................................3-6 Figure 3-4. Little-Endian Data in a Doubleword ..............................................................................3-7 Figure 3-5. Big-Endian Data in a Doubleword.................................................................................3-7 Figure 3-6. Big-Endian Misaligned Word Addressing......................................................................3-8 Figure 3-7. Little-Endian Misaligned Word Addressing ...................................................................3-8 Figure 4-1. CPU Registers...............................................................................................................4-3 Figure 4-2. Index Register ...............................................................................................................4-6 Figure 4-3. Random Register ..........................................................................................................4-7 Figure 4-4. EntryLo0 and EntryLo1 Registers .................................................................................4-8 Figure 4-5. Context Register Format ...............................................................................................4-9 Figure 4-6. PageMask Register .....................................................................................................4-10 Figure 4-7. Wired Register.............................................................................................................4-11 Figure 4-8. Wired Register Boundary ............................................................................................4-11 Figure 4-9. BadVAddr Register......................................................................................................4-12 Figure 4-10. Count Register ..........................................................................................................4-13 Figure 4-11. EntryHi Register ........................................................................................................4-14 Figure 4-12. Compare Register .....................................................................................................4-15 Figure 4-13. Status Register..........................................................................................................4-16 Figure 4-14. Cause Register..........................................................................................................4-19 Figure 4-15. EPC Register.............................................................................................................4-21 Figure 4-16. PRId Register ............................................................................................................4-22 Figure 4-17. Config Register Format .............................................................................................4-23 Figure 4-18. BadPAddr Register Format .......................................................................................4-25 Figure 4-19. Performance Counter Registers ...............................................................................4-28 Figure 4-20. TagLo and TagHi Registers .......................................................................................4-31 Figure 4-21. ErrorEPC Register.....................................................................................................4-33 Figure 5-1. Level 1 Exception processing flowchart........................................................................5-4 Figure 5-2. Level 2 Exception processing flowchart........................................................................5-6 Figure 6-1. Overview of a Virtual-to-Physical Address Translation.................................................6-3 Figure 6-2. 32-bit Mode Virtual Address Translation .......................................................................6-5
x
Figures
Figure 6-3 State Transition among Operating Modes .....................................................................6-6 Figure 6-4. User Mode Virtual Address Space ................................................................................6-8 Figure 6-5. Supervisor Mode Virtual Address Space ....................................................................6-10 Figure 6-6. Kernel Mode Address Space ......................................................................................6-11 Figure 6-7. COP0 Registers and the TLB......................................................................................6-14 Figure 6-8. Format of a TLB Entry .................................................................................................6-15 Figure 6-9. TLB Address Translation.............................................................................................6-19 Figure 7-1. Organization of Data Cache..........................................................................................7-3 Figure 7-2. Organization of Instruction Cache.................................................................................7-4 Figure 7-3. Read Missed Processed in Sequential Order.............................................................7-10 Figure 7-4. Data Cache Transition Diagram, Writeback Protocol .................................................7-11 Figure 7-5. Instruction Cache Transition Diagram.........................................................................7-12 Figure 8-1. CPU Bus Architecture ...................................................................................................8-4 Figure 8-2. CPU Bus Address and Control Path Connections in System.......................................8-5 Figure 8-3. CPU Bus Data Path Connections in System ................................................................8-6 Figure 8-4. Connection of Arbitration Signals................................................................................8-14 Figure 8-5. Arbitration Protocol......................................................................................................8-15 Figure 8-6. Cycle Stealing Protocol ...............................................................................................8-15 Figure 8-7. CPU Single Reads ......................................................................................................8-16 Figure 8-8. CPU Single Writes.......................................................................................................8-17 Figure 8-9. CPU Single Read-Write-Read-Write Cycles ...............................................................8-18 Figure 8-10. CPU Burst Reads ......................................................................................................8-19 Figure 8-11. CPU Burst Writes.......................................................................................................8-20 Figure 8-12. CPU Burst Read-Write Cycles ..................................................................................8-21 Figure 8-13. CPU Burst Write-Read Cycles ..................................................................................8-21 Figure 8-14. CPU Non-Pipeline Single Reads ..............................................................................8-22 Figure 8-15. CPU Non-Pipeline Single Writes...............................................................................8-23 Figure 8-16. CPU Non-Pipeline Burst Reads ................................................................................8-23 Figure 8-17. CPU Non-Pipeline Burst Writes ................................................................................8-24 Figure 8-18. One Operation with BUSERR* as the Last SYSDACK* ...........................................8-27 Figure 8-19. One Operation with BUSERR* as SYSAACK* .........................................................8-27 Figure 8-20. One Operation with BUSERR* as SYSAACK* and the Last SYSDACK* ...............8-28 Figure 8-21. Two Operations with Bus Error as the Last SYSDACK*...........................................8-29 Figure 9-1. Format of the Performance Counter Control Register PCCR........................................9-2 Figure 9-2. Format of Performance Counter Registers PCR0 and PCR1 .......................................9-2 Figure 9-3. CAUSE Register Fields ................................................................................................9-10 Figure 10-1. FP Registers..............................................................................................................10-3 Figure 10-2. Implementation/Revision Register ............................................................................10-5 Figure 10-3. FP Control/Status Register Bit Assignments ............................................................10-6 Figure 10-4. Control/Status Register Cause, Flag, and Enable Fields .........................................10-7
xi
Figures
Figure 10-5. Single-Precision Floating-Point Format ..................................................................10-10 Figure 10-6. Double-Precision Floating-Point Format .................................................................10-10 Figure 10-7. Binary Word Fixed-Point Format.............................................................................10-12 Figure 10-8. Binary Long Fixed-Point Format .............................................................................10-12 Figure 11-1. Control/Status Register Exception/Flag/Trap/Enable Bits ........................................11-2 Figure 12-1. Priority of Outputting Jump or Exception Target .......................................................12-7 Figure 12-2. Waveform for Sequential Excecution ........................................................................12-9 Figure 12-3. Waveform for Conditional Branch ...........................................................................12-10 Figure 12-4. Waveform for Indirect Jump (Target in Phase A)....................................................12-11 Figure 12-5. Waveform for Indirect Jump (Target in Phase B)....................................................12-12 Figure 12-6. Waveform for Indirect Jump (During Target PC Output).........................................12-13 Figure 12-7. Waveform for Exception (Target in Phase B)..........................................................12-14 Figure 12-8. Waveform for Exception (During Target PC Output)...............................................12-15 Figure 12-9. Waveform for Exception Generated by Branch or Jump Instruction .......................12-16 Figure 12-10. Waveform for Exception Generated by Branch Delay Slot Instruction..................12-17 Figure 12-11. Waveform for Exception Generated by Target Instruction ....................................12-18 Figure 12-12. Waveform for Back to Back Exceptions (Case I)...................................................12-19 Figure 12-13. Waveform for Back to Back Exceptions (Case II)..................................................12-20 Figure 13-1. Overall Structure of Hardware Breakpoint ................................................................13-3 Figure 13-2. Instruction Address Breakpoint Register...................................................................13-7 Figure 13-3. Instruction Address Breakpoint Mask Register.........................................................13-7 Figure 13-4. Data Address Breakpoint Register............................................................................13-7 Figure 13-5. Data Address Breakpoint Mask Register..................................................................13-7 Figure 13-6. Data Value Breakpoint Register ................................................................................13-8 Figure 13-7. Data Value Breakpoint Mask Register ......................................................................13-8 Figure 13-8. Hardware Breakpoint detection flow (Setting) ........................................................13-10 Figure 13-9. Hardware Breakpoint detection flow (IAB)..............................................................13-11 Figure 13-10. Hardware Breakpoint detection flow (DAB/DVB) (1/2) .........................................13-12 Figure A-1. CPU Instruction Formats ............................................................................................. A-9
xii
Tables
TABLES
Table 1-1. Restriction List ...............................................................................................................1-6 Table 2-1. Categories of Instructions and How They Are Routed ................................................2-11 Table 2-2. Concurrently Issued Instruction Categories .................................................................2-13 Table 2-3. Coprocessor 0 Registers ..............................................................................................2-15 Table 3-1. Load / Store Instructions .................................................................................................3-4 Table 3-2. Multimedia Load / Store Instructions ..............................................................................3-5 Table 3-3. Coprocessor Load / Store Instructions ...........................................................................3-5 Table 3-4. Defining Access Types (Big-Endian) ............................................................................3-10 Table 3-5. Defining Access Types (Little-Endian)..........................................................................3-12 Table 3-6. ALU Immediate Instructions..........................................................................................3-14 Table 3-7. Three Operand Register-Type Instructions ..................................................................3-15 Table 3-8. Shift Instructions ...........................................................................................................3-15 Table 3-9. Multiply and Divide Instructions ....................................................................................3-15 Table 3-10. Jump Instructions Jumping Within a 256 MByte Region............................................3-16 Table 3-11. Jump Instructions to Absolute Address ......................................................................3-16 Table 3-12. PC-Relative Conditional Branch Instructions Comparing 2 Registers .......................3-17 Table 3-13. PC-Relative Conditional Branch Instructions Comparing Against Zero .....................3-17 Table 3-14. Exception Instructions.................................................................................................3-18 Table 3-15. Serialization Instructions.............................................................................................3-18 Table 3-16. MIPS IV Instructions ...................................................................................................3-19 Table 3-17. System Control Coprocessor Instructions ..................................................................3-20 Table 3-18. Coprocessor 1 Instructions .........................................................................................3-21 Table 3-19. C790-Specific Multiply and Divide Instructions ..........................................................3-22 Table 3-20. Multimedia Instructions ...............................................................................................3-23 Table 3-21. Latencies and Repeat Rates for User Instruction.......................................................3-25 Table 4-1. Coprocessor 0 Registers ................................................................................................4-5 Table 4-2. Index Register Field Description.....................................................................................4-6 Table 4-3. Random Register Fields .................................................................................................4-7 Table 4-4. EntryLo0 and EntryLo1 Register Fields..........................................................................4-8 Table 4-5. Context Register Fields...................................................................................................4-9 Table 4-6. PageMask Register Field..............................................................................................4-10 Table 4-7. Wired Register Field Descriptions ................................................................................4-11 Table 4-8. BadVAddr Register Field...............................................................................................4-12 Table 4-9. Count Register Field .....................................................................................................4-13 Table 4-10. EntryHi Register Fields ...............................................................................................4-14 Table 4-11. Compare Register Field ..............................................................................................4-15
xiii
Tables
Table 4-12. Status Register Fields.................................................................................................4-17 Table 4-13. Cause Register Fields.................................................................................................4-19 Table 4-14. EPC Register Field .....................................................................................................4-21 Table 4-15. PRId Register Fields ...................................................................................................4-22 Table 4-16. Config Register Fields.................................................................................................4-23 Table 4-17. BadPAddr Register Fields...........................................................................................4-25 Table 4-18. Performance Counter Control Register Fields ...........................................................4-29 Table 4-19. Performance Counter Register 0 Fields .....................................................................4-30 Table 4-20. Performance Counter Register 1 Fields .....................................................................4-30 Table 4-21. TagLo Register Fields .................................................................................................4-32 Table 4-22. TagHi Register Fields..................................................................................................4-32 Table 4-23. ErrorEPC Register Field .............................................................................................4-33 Table 5-1. Exception Levels.............................................................................................................5-2 Table 5-2. Exception Vectors for Level 1 exceptions.......................................................................5-7 Table 5-3. Exception Vectors for Level 2 exceptions.......................................................................5-7 Table 5-4. Cause.ExcCode Field .....................................................................................................5-8 Table 5-5. Cause.EXC2 Field ..........................................................................................................5-8 Table 5-6. Masking exceptions .........................................................................................................5-9 Table 5-7. Exception Priority Order................................................................................................5-10 Table 6-1 Processor Modes .............................................................................................................6-6 Table 6-2. Address Space................................................................................................................6-7 Table 6-3. User Mode Segments .....................................................................................................6-9 Table 6-4. Supervisor Mode Segments .........................................................................................6-10 Table 6-5. Kernel Mode Segments ................................................................................................6-12 Table 6-6 TLB Page Coherency (C) Bit Values .............................................................................6-17 Table 6-7. TLB Instructions ............................................................................................................6-20 Table 7-1. Cache Configuration .......................................................................................................7-2 Table 7-2. Cache Size and Access Bits...........................................................................................7-5 Table 7-3. Data Cache Line States ...................................................................................................7-6 Table 7-4. LRF Line Replacement Algorithm...................................................................................7-8 Table 7-5. Quadword Retrieved Address PA[5:4]..........................................................................7-10 Table 7-6. UCAB Configuration......................................................................................................7-14 Table 7-7. UCAB Size and Access Bits .........................................................................................7-14 Table 8-1. System Signal Naming Convention ................................................................................8-3 Table 8-2. Bus Transaction Types ...................................................................................................8-8 Table 8-3. CPU Transfer Size ..........................................................................................................8-9 Table 8-4. Bus Error Exceptions ....................................................................................................8-25 Table 8-5. Operation Termination Sequence .................................................................................8-26 Table 9-1. PCCR Register Bits ........................................................................................................9-2 Table 9-2. Writing Performance Counters and Registers using MTC0 ...........................................9-3
xiv
Tables
Table 9-3. Reading Performance Counters and Registers using MFC0 .........................................9-3 Table 9-4. Mnemonics to Access the Performance Counters and Registers...................................9-3 Table 9-5. Counter Events ...............................................................................................................9-6 Table 9-6. Definition of Data Cache Miss ........................................................................................9-7 Table 10-1. Floating-Point Control Register Assignments.............................................................10-4 Table 10-2. FCR0 Fields ................................................................................................................10-5 Table 10-3. Control/Status Register Fields ....................................................................................10-6 Table 10-4. Flush Values of Denormalized Results.......................................................................10-7 Table 10-5. Rounding Mode Bit Decoding .....................................................................................10-9 Table 10-6. Equations for Calculating Values in Single and Double-Precision Floating-Point Format.................................................................10-11 Table 10-7. Floating-Point Format Parameter Values .................................................................10-11 Table 10-8. Minimum and Maximum Floating-Point Values ........................................................10-11 Table 10-9. Binary Fixed-Point Format Fields .............................................................................10-12 Table 10-10. FPU Instruction Set (Optional): Load, Move and Store Instruction ........................10-13 Table 10-11. FPU Instruction Set(Optional): Conversion Instruction...........................................10-14 Table 10-12. FPU Instruction Set(Optional): Computational Instruction .....................................10-14 Table 10-13. FPU Instruction Set(Optional): Compare and Branch Instruction ..........................10-15 Table 11-1.Default FPU Exception Actions .................................................................................11-3 Table 11-2.FPU Exception-Causing Conditions ..........................................................................11-4 Table 11-3.Values of Overflow Results........................................................................................11-7 Table 12-1. Classification of Branch and Jump Instruction ...........................................................12-2 Table 12-2. Exception Vector Address Codes ...............................................................................12-6 Table 13-1. Set a new value into breakpoint registers ..................................................................13-4 Table 13-2. Get the value from breakpoint registers .....................................................................13-4 Table 13-3. BPC Register Fields....................................................................................................13-5 Table A-1. Symbols in Instruction Operation Statements............................................................... A-3 Table A-2. Coprocessor General Register Access Functions ........................................................ A-5 Table A-3. Load and Store Functions ............................................................................................. A-6 Table A-4. AccessLength Specifications for Loads / Stores........................................................... A-7 Table A-5. Miscellaneous Functions ............................................................................................... A-8 Table B-1. Quotient and Remainder Signs ...................................................................................... B-8 Table C-1. CACHE Instruction Op Field Encoding ......................................................................... C-6 Table C-2. Data Tag Status Bit Modifications ................................................................................ C-13 Table D-1. FPU Comparisons Without Special Operand Exceptions............................................. D-9 Table D-2 FPU Comparisons With Special Operand Exceptions for QNaNs .............................. D-10
xv
Tables
xvi
Handling Precautions
1
Using Toshiba Semiconductors Safely
1.
Using Toshiba Semiconductors Safely
TOSHIBA is continually working to improve the quality and the reliability of its products. Nevertheless, semiconductor devices in general can malfunction or fail due to their inherent electrical sensitivity and vulnerability to physical stress. It is the responsibility of the buyer, when utilizing TOSHIBA products, to observe standards of safety, and to avoid situations in which a malfunction or failure of a TOSHIBA product could cause loss of human life, bodily injury or damage to property. In developing your designs, please ensure that TOSHIBA products are used within specified operating ranges as set forth in the most recent products specifications. Also, please keep in mind the precautions and conditions set forth in the TOSHIBA Semiconductor Reliability Handbook.
1-1
1
Using Toshiba Semiconductors Safely
1-2
2
Safety Precautions
2.
Safety Precautions
This section lists important precautions which users of semiconductor devices (and anyone else) should observe in order to avoid injury and damage to property, and to ensure safe and correct use of devices. Please be sure that you understand the meanings of the labels and the graphic symbol described below before you move on to the detailed descriptions of the precautions.
[Explanation of labels] Indicates an imminently hazardous situation which will result in death or serious injury if you do not follow instructions. Indicates a potentially hazardous situation which could result in death or serious injury if you do not follow instructions. Indicates a potentially hazardous situation which if not avoided, may result in minor injury or moderate injury.
[Explanation of graphic symbol]
Graphic symbol Meaning
Indicates that caution is required (laser beam is dangerous to eyes).
2-1
2
Safety Precautions
2.1
General Precautions regarding Semiconductor Devices
Do not use devices under conditions exceeding their absolute maximum ratings (e.g. current, voltage, power dissipation or temperature). This may cause the device to break down, degrade its performance, or cause it to catch fire or explode resulting in injury. Do not insert devices in the wrong orientation. Make sure that the positive and negative terminals of power supplies are connected correctly. Otherwise the rated maximum current or power dissipation may be exceeded and the device may break down or undergo performance degradation, causing it to catch fire or explode and resulting in injury. When power to a device is on, do not touch the device's heat sink. Heat sinks become hot, so you may burn your hand. Do not touch the tips of device leads. Because some types of device have leads with pointed tips, you may prick your finger. When conducting any kind of evaluation, inspection or testing, be sure to connect the testing equipment's electrodes or probes to the pins of the device under test before powering it on. Otherwise, you may receive an electric shock causing injury. Before grounding an item of measuring equipment or a soldering iron, check that there is no electrical leakage from it. Electrical leakage may cause the device which you are testing or soldering to break down, or could give you an electric shock. Always wear protective glasses when cutting the leads of a device with clippers or a similar tool. If you do not, small bits of metal flying off the cut ends may damage your eyes.
2-2
2
Safety Precautions
2.2
2.2.1
Precautions Specific to Each Product Group
Optical semiconductor devices
When a visible semiconductor laser is operating, do not look directly into the laser beam or look through the optical system. This is highly likely to impair vision, and in the worst case may cause blindness. If it is necessary to examine the laser apparatus, for example to inspect its optical characteristics, always wear the appropriate type of laser protective glasses as stipulated by IEC standard IEC825-1.
Ensure that the current flowing in an LED device does not exceed the device's maximum rated current. This is particularly important for resin-packaged LED devices, as excessive current may cause the package resin to blow up, scattering resin fragments and causing injury. When testing the dielectric strength of a photocoupler, use testing equipment which can shut off the supply voltage to the photocoupler. If you detect a leakage current of more than 100 A, use the testing equipment to shut off the photocoupler's supply voltage; otherwise a large short-circuit current will flow continuously, and the device may break down or burst into flames, resulting in fire or injury. When incorporating a visible semiconductor laser into a design, use the device's internal photodetector or a separate photodetector to stabilize the laser's radiant power so as to ensure that laser beams exceeding the laser's rated radiant power cannot be emitted. If this stabilizing mechanism does not work and the rated radiant power is exceeded, the device may break down or the excessively powerful laser beams may cause injury.
2.2.2
Power devices
Never touch a power device while it is powered on. Also, after turning off a power device, do not touch it until it has thoroughly discharged all remaining electrical charge. Touching a power device while it is powered on or still charged could cause a severe electric shock, resulting in death or serious injury. When conducting any kind of evaluation, inspection or testing, be sure to connect the testing equipment's electrodes or probes to the device under test before powering it on. When you have finished, discharge any electrical charge remaining in the device. Connecting the electrodes or probes of testing equipment to a device while it is powered on may result in electric shock, causing injury.
2-3
2
Safety Precautions
Do not use devices under conditions which exceed their absolute maximum ratings (current, voltage, power dissipation, temperature etc.). This may cause the device to break down, causing a large short-circuit current to flow, which may in turn cause it to catch fire or explode, resulting in fire or injury. Use a unit which can detect short-circuit currents and which will shut off the power supply if a short-circuit occurs. If the power supply is not shut off, a large short-circuit current will flow continuously, which may in turn cause the device to catch fire or explode, resulting in fire or injury. When designing a case for enclosing your system, consider how best to protect the user from shrapnel in the event of the device catching fire or exploding. Flying shrapnel can cause injury. When conducting any kind of evaluation, inspection or testing, always use protective safety tools such as a cover for the device. Otherwise you may sustain injury caused by the device catching fire or exploding. Make sure that all metal casings in your design are grounded to earth. Even in modules where a device's electrodes and metal casing are insulated, capacitance in the module may cause the electrostatic potential in the casing to rise. Dielectric breakdown may cause a high voltage to be applied to the casing, causing electric shock and injury to anyone touching it. When designing the heat radiation and safety features of a system incorporating high-speed rectifiers, remember to take the device's forward and reverse losses into account. The leakage current in these devices is greater than that in ordinary rectifiers; as a result, if a high-speed rectifier is used in an extreme environment (e.g. at high temperature or high voltage), its reverse loss may increase, causing thermal runaway to occur. This may in turn cause the device to explode and scatter shrapnel, resulting in injury to the user. A design should ensure that, except when the main circuit of the device is active, reverse bias is applied to the device gate while electricity is conducted to control circuits, so that the main circuit will become inactive. Malfunction of the device may cause serious accidents or injuries.
When conducting any kind of evaluation, inspection or testing, either wear protective gloves or wait until the device has cooled properly before handling it. Devices become hot when they are operated. Even after the power has been turned off, the device will retain residual heat which may cause a burn to anyone touching it.
2.2.3
Bipolar ICs (for use in automobiles)
If your design includes an inductive load such as a motor coil, incorporate diodes or similar devices into the design to prevent negative current from flowing in. The load current generated by powering the device on and off may cause it to function erratically or to break down, which could in turn cause injury. Ensure that the power supply to any device which incorporates protective functions is stable. If the power supply is unstable, the device may operate erratically, preventing the protective functions from working correctly. If protective functions fail, the device may break down causing injury to the user.
2-4
3 General Safety Precautions and Usage Considerations
3.
General Safety Precautions and Usage Considerations
This section is designed to help you gain a better understanding of semiconductor devices, so as to ensure the safety, quality and reliability of the devices which you incorporate into your designs.
3.1
3.1.1
From Incoming to Shipping
Electrostatic discharge (ESD)
When handling individual devices (which are not yet mounted on a printed circuit board), be sure that the environment is protected against electrostatic electricity. Operators should wear anti-static clothing, and containers and other objects which come into direct contact with devices should be made of anti-static materials and should be grounded to earth via an 0.5- to 1.0-M protective resistor. Please follow the precautions described below; this is particularly important for devices which are marked "Be careful of static.". (1) Work environment
* When humidity in the working environment decreases, the human body and other insulators
can easily become charged with static electricity due to friction. Maintain the recommended humidity of 40% to 60% in the work environment, while also taking into account the fact that moisture-proof-packed products may absorb moisture after unpacking.
* Be sure that all equipment, jigs and tools in the working area are grounded to earth. * Place a conductive mat over the floor of the work area, or take other appropriate measures, so
that the floor surface is protected against static electricity and is grounded to earth. The surface resistivity should be 104 to 108 /sq and the resistance between surface and ground, 7.5 x 105 to 108
* Cover the workbench surface also with a conductive mat (with a surface resistivity of 104 to
108 /sq, for a resistance between surface and ground of 7.5 x 105 to 108 ) . The purpose of this is to disperse static electricity on the surface (through resistive components) and ground it to earth. Workbench surfaces must not be constructed of low-resistance metallic materials that allow rapid static discharge when a charged device touches them directly.
* Pay attention to the following points when using automatic equipment in your workplace:
(a) When picking up ICs with a vacuum unit, use a conductive rubber fitting on the end of the pick-up wand to protect against electrostatic charge. (b) Minimize friction on IC package surfaces. If some rubbing is unavoidable due to the device's mechanical structure, minimize the friction plane or use material with a small friction coefficient and low electrical resistance. Also, consider the use of an ionizer. (c) In sections which come into contact with device lead terminals, use a material which dissipates static electricity. (d) Ensure that no statically charged bodies (such as work clothes or the human body) touch the devices.
3-1
3 General Safety Precautions and Usage Considerations
(e) Make sure that sections of the tape carrier which come into contact with installation devices or other electrical machinery are made of a low-resistance material. (f) Make sure that jigs and tools used in the assembly process do not touch devices.
(g) In processes in which packages may retain an electrostatic charge, use an ionizer to neutralize the ions.
* Make sure that CRT displays in the working area are protected against static charge, for
example by a VDT filter. As much as possible, avoid turning displays on and off. Doing so can cause electrostatic induction in devices.
* Keep track of charged potential in the working area by taking periodic measurements. * Ensure that work chairs are protected by an anti-static textile cover and are grounded to the
floor surface by a grounding chain. (Suggested resistance between the seat surface and grounding chain is 7.5 x 105 to 1012.) /sq; suggested resistance between surface and ground is 7.5 x 105 to 108 .)
* Install anti-static mats on storage shelf surfaces. (Suggested surface resistivity is 104 to 108 * For transport and temporary storage of devices, use containers (boxes, jigs or bags) that are
made of anti-static materials or materials which dissipate electrostatic charge.
* Make sure that cart surfaces which come into contact with device packaging are made of
materials which will conduct static electricity, and verify that they are grounded to the floor surface via a grounding chain.
* In any location where the level of static electricity is to be closely controlled, the ground
resistance level should be Class 3 or above. Use different ground wires for all items of equipment which may come into physical contact with devices.
(2) Operating environment
* Operators must wear anti-static clothing and conductive shoes (or
a leg or heel strap).
* Operators must wear a wrist strap grounded to earth via a
resistor of about 1 M.
* Soldering irons must be grounded from iron tip to earth, and must be used only at low voltages
(6 V to 24 V).
* If the tweezers you use are likely to touch the device terminals, use anti-static tweezers and in
particular avoid metallic tweezers. If a charged device touches a low-resistance tool, rapid discharge can occur. When using vacuum tweezers, attach a conductive chucking pat to the tip, and connect it to a dedicated ground used especially for anti-static purposes (suggested resistance value: 104 to 108 ). CRT).
* Do not place devices or their containers near sources of strong electrical fields (such as above a
3-2
3 General Safety Precautions and Usage Considerations
* When storing printed circuit boards which have devices mounted on them, use a board
container or bag that is protected against static charge. To avoid the occurrence of static charge or discharge due to friction, keep the boards separate from one other and do not stack them directly on top of one another.
* Ensure, if possible, that any articles (such as clipboards) which are brought to any location
where the level of static electricity must be closely controlled are constructed of anti-static materials.
* In cases where the human body comes into direct contact with a device, be sure to wear antistatic finger covers or gloves (suggested resistance value: 108 or less).
* Equipment safety covers installed near devices should have resistance ratings of 109 or less. * If a wrist strap cannot be used for some reason, and there is a possibility of imparting friction to
devices, use an ionizer.
* The transport film used in TCP products is manufactured from materials in which static
charges tend to build up. When using these products, install an ionizer to prevent the film from being charged with static electricity. Also, ensure that no static electricity will be applied to the product's copper foils by taking measures to prevent static occuring in the peripheral equipment.
3.1.2
Vibration, impact and stress
Handle devices and packaging materials with care. To avoid damage to devices, do not toss or drop packages. Ensure that devices are not subjected to mechanical vibration or shock during transportation. Ceramic package devices and devices in canister-type packages which have empty space inside them are subject to damage from vibration and shock because the bonding wires are secured only at their ends.
Vibration
Plastic molded devices, on the other hand, have a relatively high level of resistance to vibration and mechanical shock because their bonding wires are enveloped and fixed in resin. However, when any device or package type is installed in target equipment, it is to some extent susceptible to wiring disconnections and other damage from vibration, shock and stressed solder junctions. Therefore when devices are incorporated into the design of equipment which will be subject to vibration, the structural design of the equipment must be thought out carefully. If a device is subjected to especially strong vibration, mechanical shock or stress, the package or the chip itself may crack. In products such as CCDs which incorporate window glass, this could cause surface flaws in the glass or cause the connection between the glass and the ceramic to separate. Furthermore, it is known that stress applied to a semiconductor device through the package changes the resistance characteristics of the chip because of piezoelectric effects. In analog circuit design attention must be paid to the problem of package stress as well as to the dangers of vibration and shock as described above.
3-3
3 General Safety Precautions and Usage Considerations
3.2
3.2.1
Storage
General storage * Avoid storage locations where devices will be exposed to moisture or direct sunlight. * Follow the instructions printed on the device cartons regarding
transportation and storage.
* The storage area temperature should be kept within a
Humidity:
Temperature:
temperature range of 5C to 35C, and relative humidity should be maintained at between 45% and 75%.
* Do not store devices in the presence of harmful (especially
corrosive) gases, or in dusty conditions.
* Use storage areas where there is minimal temperature fluctuation. Rapid temperature changes
can cause moisture to form on stored devices, resulting in lead oxidation or corrosion. As a result, the solderability of the leads will be degraded.
* When repacking devices, use anti-static containers. * Do not allow external forces or loads to be applied to devices while they are in storage. * If devices have been stored for more than two years, their electrical characteristics should be
tested and their leads should be tested for ease of soldering before they are used.
3.2.2
Moisture-proof packing
Moisture-proof packing should be handled with care. The handling procedure specified for each packing type should be followed scrupulously. If the proper procedures are not followed, the quality and reliability of devices may be degraded. This section describes general precautions for handling moisture-proof packing. Since the details may differ from device to device, refer also to the relevant individual datasheets or databook. (1) General precautions Follow the instructions printed on the device cartons regarding transportation and storage.
* Do not drop or toss device packing. The laminated aluminum material in it can be rendered
ineffective by rough handling.
* The storage area temperature should be kept within a temperature range of 5C to 30C, and
relative humidity should be maintained at 90% (max). Use devices within 12 months of the date marked on the package seal.
3-4
3 General Safety Precautions and Usage Considerations
* If the 12-month storage period has expired, or if the 30% humidity indicator shown in Figure 1
is pink when the packing is opened, it may be advisable, depending on the device and packing type, to back the devices at high temperature to remove any moisture. Please refer to the table below. After the pack has been opened, use the devices in a 5C to 30C. 60% RH environment and within the effective usage period listed on the moisture-proof package. If the effective usage period has expired, or if the packing has been stored in a high-humidity environment, back the devices at high temperature.
Packing Moisture removal If the packing bears the "Heatproof" marking or indicates the maximum temperature which it can withstand, bake at 125C for 20 hours. (Some devices require a different procedure.) Transfer devices to trays bearing the "Heatproof" marking or indicating the temperature which they can withstand, or to aluminum tubes before baking at 125C for 20 hours. Deviced packed on tape cannot be baked and must be used within the effective usage period after unpacking, as specified on the packing.
Tray Tube Tape
* When baking devices, protect the devices from static electricity. * Moisture indicators can detect the approximate humidity level at a standard temperature of
25C. 6-point indicators and 3-point indicators are currently in use, but eventually all indicators will be 3-point indicators.
HUMIDITY INDICATOR 60%
50%
DANGER IF PINK CHANGE DESICCANT
40%
HUMIDITY INDICATOR
30%
40 DANGER IF PINK
20%
30
10% READ AT LAVENDER BETWEEN PINK & BLUE (a) 6-point indicator
20 READ AT LAVENDER BETWEEN PINK & BLUE (b) 3-point indicator
Figure 1
Humidity indicator
3-5
3 General Safety Precautions and Usage Considerations
3.3
Design
Care must be exercised in the design of electronic equipment to achieve the desired reliability. It is important not only to adhere to specifications concerning absolute maximum ratings and recommended operating conditions, it is also important to consider the overall environment in which equipment will be used, including factors such as the ambient temperature, transient noise and voltage and current surges, as well as mounting conditions which affect device reliability. This section describes some general precautions which you should observe when designing circuits and when mounting devices on printed circuit boards. For more detailed information about each product family, refer to the relevant individual technical datasheets available from Toshiba.
3.3.1
Absolute maximum ratings
Do not use devices under conditions in which their absolute maximum ratings (e.g. current, voltage, power dissipation or temperature) will be exceeded. A device may break down or its performance may be degraded, causing it to catch fire or explode resulting in injury to the user. The absolute maximum ratings are rated values which must not be exceeded during operation, even for an instant. Although absolute maximum ratings differ from product to product, they essentially concern the voltage and current at each pin, the allowable power dissipation, and the junction and storage temperatures. If the voltage or current on any pin exceeds the absolute maximum rating, the device's internal circuitry can become degraded. In the worst case, heat generated in internal circuitry can fuse wiring or cause the semiconductor chip to break down. If storage or operating temperatures exceed rated values, the package seal can deteriorate or the wires can become disconnected due to the differences between the thermal expansion coefficients of the materials from which the device is constructed.
3.3.2
Recommended operating conditions
The recommended operating conditions for each device are those necessary to guarantee that the device will operate as specified in the datasheet. If greater reliability is required, derate the device's absolute maximum ratings for voltage, current, power and temperature before using it.
3.3.3
Derating
When incorporating a device into your design, reduce its rated absolute maximum voltage, current, power dissipation and operating temperature in order to ensure high reliability. Since derating differs from application to application, refer to the technical datasheets available for the various devices used in your design.
3.3.4
Unused pins
If unused pins are left open, some devices can exhibit input instability problems, resulting in malfunctions such as abrupt increase in current flow. Similarly, if the unused output pins on a device are connected to the power supply pin, the ground pin or to other output pins, the IC may malfunction or break down.
3-6
3 General Safety Precautions and Usage Considerations
Since the details regarding the handling of unused pins differ from device to device and from pin to pin, please follow the instructions given in the relevant individual datasheets or databook. CMOS logic IC inputs, for example, have extremely high impedance. If an input pin is left open, it can easily pick up extraneous noise and become unstable. In this case, if the input voltage level reaches an intermediate level, it is possible that both the P-channel and N-channel transistors will be turned on, allowing unwanted supply current to flow. Therefore, ensure that the unused input pins of a device are connected to the power supply (Vcc) pin or ground (GND) pin of the same device. For details of what to do with the pins of heat sinks, refer to the relevant technical datasheet and databook.
3.3.5
Latch-up
Latch-up is an abnormal condition inherent in CMOS devices, in which Vcc gets shorted to ground. This happens when a parasitic PN-PN junction (thyristor structure) internal to the CMOS chip is turned on, causing a large current of the order of several hundred mA or more to flow between Vcc and GND, eventually causing the device to break down. Latch-up occurs when the input or output voltage exceeds the rated value, causing a large current to flow in the internal chip, or when the voltage on the Vcc (Vdd) pin exceeds its rated value, forcing the internal chip into a breakdown condition. Once the chip falls into the latch-up state, even though the excess voltage may have been applied only for an instant, the large current continues to flow between Vcc (Vdd) and GND (Vss). This causes the device to heat up and, in extreme cases, to emit gas fumes as well. To avoid this problem, observe the following precautions: (1) Do not allow voltage levels on the input and output pins either to rise above Vcc (Vdd) or to fall below GND (Vss). Also, follow any prescribed power-on sequence, so that power is applied gradually or in steps rather than abruptly. (2) Do not allow any abnormal noise signals to be applied to the device. (3) Set the voltage levels of unused input pins to Vcc (Vdd) or GND (Vss). (4) Do not connect output pins to one another.
3.3.6
Input/Output protection
Wired-AND configurations, in which outputs are connected together, cannot be used, since this short-circuits the outputs. Outputs should, of course, never be connected to Vcc (Vdd) or GND (Vss). Furthermore, ICs with tri-state outputs can undergo performance degradation if a shorted output current is allowed to flow for an extended period of time. Therefore, when designing circuits, make sure that tri-state outputs will not be enabled simultaneously.
3.3.7
Load capacitance
Some devices display increased delay times if the load capacitance is large. Also, large charging and discharging currents will flow in the device, causing noise. Furthermore, since outputs are shorted for a relatively long time, wiring can become fused. Consult the technical information for the device being used to determine the recommended load capacitance.
3-7
3 General Safety Precautions and Usage Considerations
3.3.8
Thermal design
The failure rate of semiconductor devices is greatly increased as operating temperatures increase. As shown in Figure 2, the internal thermal stress on a device is the sum of the ambient temperature and the temperature rise due to power dissipation in the device. Therefore, to achieve optimum reliability, observe the following precautions concerning thermal design: (1) Keep the ambient temperature (Ta) as low as possible. (2) If the device's dynamic power dissipation is relatively large, select the most appropriate circuit board material, and consider the use of heat sinks or of forced air cooling. Such measures will help lower the thermal resistance of the package. (3) Derate the device's absolute maximum ratings to minimize thermal stress from power dissipation. ja = jc + ca ja = (Tj-Ta) / P jc = (Tj-Tc) / P ca = (Tc-Ta) / P in which ja = thermal resistance between junction and surrounding air (C/W) jc = thermal resistance between junction and package surface, or internal thermal resistance (C/W) ca = thermal resistance between package surface and surrounding air, or external thermal resistance (C/W) Tj = junction temperature or chip temperature (C) Tc = package surface temperature or case temperature (C) Ta = ambient temperature (C) P = power dissipation (W)
Ta ca Tc jc Tj
Figure 2
Thermal resistance of package
3.3.9
Interfacing
When connecting inputs and outputs between devices, make sure input voltage (VIL/VIH) and output voltage (VOL/VOH) levels are matched. Otherwise, the devices may malfunction. When connecting devices operating at different supply voltages, such as in a dual-power-supply system, be aware that erroneous power-on and power-off sequences can result in device breakdown. For details of how to interface particular devices, consult the relevant technical datasheets and databooks. If you have any questions or doubts about interfacing, contact your nearest Toshiba office or distributor.
3-8
3 General Safety Precautions and Usage Considerations
3.3.10
Decoupling
Spike currents generated during switching can cause Vcc (Vdd) and GND (Vss) voltage levels to fluctuate, causing ringing in the output waveform or a delay in response speed. (The power supply and GND wiring impedance is normally 50 to 100 .) For this reason, the impedance of power supply lines with respect to high frequencies must be kept low. This can be accomplished by using thick and short wiring for the Vcc (Vdd) and GND (Vss) lines and by installing decoupling capacitors (of approximately 0.01 F to 1 F capacitance) as high-frequency filters between Vcc (Vdd) and GND (Vss) at strategic locations on the printed circuit board. For low-frequency filtering, it is a good idea to install a 10- to 100-F capacitor on the printed circuit board (one capacitor will suffice). If the capacitance is excessively large, however, (e.g. several thousand F) latch-up can be a problem. Be sure to choose an appropriate capacitance value. An important point about wiring is that, in the case of high-speed logic ICs, noise is caused mainly by reflection and crosstalk, or by the power supply impedance. Reflections cause increased signal delay, ringing, overshoot and undershoot, thereby reducing the device's safety margins with respect to noise. To prevent reflections, reduce the wiring length by increasing the device mounting density so as to lower the inductance (L) and capacitance (C) in the wiring. Extreme care must be taken, however, when taking this corrective measure, since it tends to cause crosstalk between the wires. In practice, there must be a trade-off between these two factors.
3.3.11
External noise
Printed circuit boards with long I/O or signal pattern lines are vulnerable to induced noise or surges from outside sources. Consequently, malfunctions or breakdowns can result from overcurrent or overvoltage, depending on the types of device used. To protect against noise, lower the impedance of the pattern line or insert a noise-canceling circuit. Protective measures must also be taken against surges. For details of the appropriate protective measures for a particular device, consult the relevant databook.
Input/Output Signals
3.3.12
Electromagnetic interference
Widespread use of electrical and electronic equipment in recent years has brought with it radio and TV reception problems due to electromagnetic interference. To use the radio spectrum effectively and to maintain radio communications quality, each country has formulated regulations limiting the amount of electromagnetic interference which can be generated by individual products. Electromagnetic interference includes conduction noise propagated through power supply and telephone lines, and noise from direct electromagnetic waves radiated by equipment. Different measurement methods and corrective measures are used to assess and counteract each specific type of noise. Difficulties in controlling electromagnetic interference derive from the fact that there is no method available which allows designers to calculate, at the design stage, the strength of the electromagnetic waves which will emanate from each component in a piece of equipment. For this reason, it is only after the prototype equipment has been completed that the designer can take measurements using a dedicated instrument to determine the strength of electromagnetic interference waves. Yet it is possible during system design to incorporate some measures for the prevention of electromagnetic interference, which can facilitate taking corrective measures once the design has been completed. These include installing shields and noise filters, and increasing
3-9
3 General Safety Precautions and Usage Considerations
the thickness of the power supply wiring patterns on the printed circuit board. One effective method, for example, is to devise several shielding options during design, and then select the most suitable shielding method based on the results of measurements taken after the prototype has been completed.
3.3.13
Peripheral circuits
In most cases semiconductor devices are used with peripheral circuits and components. The input and output signal voltages and currents in these circuits must be chosen to match the semiconductor device's specifications. The following factors must be taken into account. (1) Inappropriate voltages or currents applied to a device's input pins may cause it to operate erratically. Some devices contain pull-up or pull-down resistors. When designing your system, remember to take the effect of this on the voltage and current levels into account. (2) The output pins on a device have a predetermined external circuit drive capability. If this drive capability is greater than that required, either incorporate a compensating circuit into your design or carefully select suitable components for use in external circuits.
3.3.14
Safety standards
Each country has safety standards which must be observed. These safety standards include requirements for quality assurance systems and design of device insulation. Such requirements must be fully taken into account to ensure that your design conforms to the applicable safety standards.
3.3.15
Other precautions
(1) When designing a system, be sure to incorporate fail-safe and other appropriate measures according to the intended purpose of your system. Also, be sure to debug your system under actual board-mounted conditions. (2) If a plastic-package device is placed in a strong electric field, surface leakage may occur due to the charge-up phenomenon, resulting in device malfunction. In such cases take appropriate measures to prevent this problem, for example by protecting the package surface with a conductive shield. (3) With some microcomputers and MOS memory devices, caution is required when powering on or resetting the device. To ensure that your design does not violate device specifications, consult the relevant databook for each constituent device. (4) Ensure that no conductive material or object (such as a metal pin) can drop onto and short the leads of a device mounted on a printed circuit board.
3.4
3.4.1
Inspection, Testing and Evaluation
Grounding
Ground all measuring instruments, jigs, tools and soldering irons to earth. Electrical leakage may cause a device to break down or may result in electric shock.
3-10
3 General Safety Precautions and Usage Considerations
3.4.2
Inspection Sequence
Do not insert devices in the wrong orientation. Make sure that the positive and negative electrodes of the power supply are correctly connected. Otherwise, the rated maximum current or maximum power dissipation may be exceeded and the device may break down or undergo performance degradation, causing it to catch fire or explode, resulting in injury to the user. When conducting any kind of evaluation, inspection or testing using AC power with a peak voltage of 42.4 V or DC power exceeding 60 V, be sure to connect the electrodes or probes of the testing equipment to the device under test before powering it on. Connecting the electrodes or probes of testing equipment to a device while it is powered on may result in electric shock, causing injury. (1) Apply voltage to the test jig only after inserting the device securely into it. When applying or removing power, observe the relevant precautions, if any. (2) Make sure that the voltage applied to the device is off before removing the device from the test jig. Otherwise, the device may undergo performance degradation or be destroyed. (3) Make sure that no surge voltages from the measuring equipment are applied to the device. (4) The chips housed in tape carrier packages (TCPs) are bare chips and are therefore exposed. During inspection take care not to crack the chip or cause any flaws in it. Electrical contact may also cause a chip to become faulty. Therefore make sure that nothing comes into electrical contact with the chip.
3.5
Mounting
There are essentially two main types of semiconductor device package: lead insertion and surface mount. During mounting on printed circuit boards, devices can become contaminated by flux or damaged by thermal stress from the soldering process. With surface-mount devices in particular, the most significant problem is thermal stress from solder reflow, when the entire package is subjected to heat. This section describes a recommended temperature profile for each mounting method, as well as general precautions which you should take when mounting devices on printed circuit boards. Note, however, that even for devices with the same package type, the appropriate mounting method varies according to the size of the chip and the size and shape of the lead frame. Therefore, please consult the relevant technical datasheet and databook.
3.5.1
Lead forming
Always wear protective glasses when cutting the leads of a device with clippers or a similar tool. If you do not, small bits of metal flying off the cut ends may damage your eyes. Do not touch the tips of device leads. Because some types of device have leads with pointed tips, you may prick your finger. Semiconductor devices must undergo a process in which the leads are cut and formed before the devices can be mounted on a printed circuit board. If undue stress is applied to the interior of a device during this process, mechanical breakdown or performance degradation can result. This is attributable primarily to differences between the stress on the device's external leads and the stress on the internal leads. If the relative difference is great enough, the device's internal leads, adhesive properties or sealant can be damaged. Observe these precautions during the leadforming process (this does not apply to surface-mount devices):
3-11
3 General Safety Precautions and Usage Considerations
(1) Lead insertion hole intervals on the printed circuit board should match the lead pitch of the device precisely. (2) If lead insertion hole intervals on the printed circuit board do not precisely match the lead pitch of the device, do not attempt to forcibly insert devices by pressing on them or by pulling on their leads. (3) For the minimum clearance specification between a device and a printed circuit board, refer to the relevant device's datasheet and databook. If necessary, achieve the required clearance by forming the device's leads appropriately. Do not use the spacers which are used to raise devices above the surface of the printed circuit board during soldering to achieve clearance. These spacers normally continue to expand due to heat, even after the solder has begun to solidify; this applies severe stress to the device. (4) Observe the following precautions when forming the leads of a device prior to mounting.
* Use a tool or jig to secure the lead at its base (where the lead meets the device package) while
bending so as to avoid mechanical stress to the device. Also avoid bending or stretching device leads repeatedly.
* Be careful not to damage the lead during lead forming. * Follow any other precautions described in the individual datasheets and databooks for each
device and package type.
3.5.2
Socket mounting
(1) When socket mounting devices on a printed circuit board, use sockets which match the inserted device's package. (2) Use sockets whose contacts have the appropriate contact pressure. If the contact pressure is insufficient, the socket may not make a perfect contact when the device is repeatedly inserted and removed; if the pressure is excessively high, the device leads may be bent or damaged when they are inserted into or removed from the socket. (3) When soldering sockets to the printed circuit board, use sockets whose construction prevents flux from penetrating into the contacts or which allows flux to be completely cleaned off. (4) Make sure the coating agent applied to the printed circuit board for moisture-proofing purposes does not stick to the socket contacts. (5) If the device leads are severely bent by a socket as it is inserted or removed and you wish to repair the leads so as to continue using the device, make sure that this lead correction is only performed once. Do not use devices whose leads have been corrected more than once. (6) If the printed circuit board with the devices mounted on it will be subjected to vibration from external sources, use sockets which have a strong contact pressure so as to prevent the sockets and devices from vibrating relative to one another.
3.5.3
Soldering temperature profile
The soldering temperature and heating time vary from device to device. Therefore, when specifying the mounting conditions, refer to the individual datasheets and databooks for the devices used.
3-12
3 General Safety Precautions and Usage Considerations
(1) Using a soldering iron Complete soldering within ten seconds for lead temperatures of up to 260C, or within three seconds for lead temperatures of up to 350C. (2) Using medium infrared ray reflow
* Heating top and bottom with long or medium infrared rays is recommended (see Figure 3).
Medium infrared ray heater (reflow) Product flow
Long infrared ray heater (preheating)
Figure 3
Heating top and bottom with long or medium infrared rays
* Complete the infrared ray reflow process within 30 seconds at a package surface temperature of
between 210C and 240C.
* Refer to Figure 4 for an example of a good temperature profile for infrared or hot air reflow.
(C) 240 Package surface temperature
210
160 140 60-120 seconds 30 seconds or less Time (in seconds)
Figure 4 (3) Using hot air reflow
Sample temperature profile for infrared or hot air reflow
* Complete hot air reflow within 30 seconds at a package surface temperature of between 210C
and 240C.
* For an example of a recommended temperature profile, refer to Figure 4 above.
(4) Using solder flow
* Apply preheating for 60 to 120 seconds at a temperature of 150C. * For lead insertion-type packages, complete solder flow within 10 seconds with the
temperature at the stopper (or, if there is no stopper, at a location more than 1.5 mm from the body) which does not exceed 260C.
3-13
3 General Safety Precautions and Usage Considerations
* For surface-mount packages, complete soldering within 5 seconds at a temperature of 250C or
less in order to prevent thermal stress in the device. using solder flow.
* Figure 5 shows an example of a recommended temperature profile for surface-mount packages
(C) 250 Package surface temperature
160 140 60-120 seconds 5 seconds or less
Time (in seconds)
Figure 5
Sample temperature profile for solder flow
3.5.4
Flux cleaning and ultrasonic cleaning
(1) When cleaning circuit boards to remove flux, make sure that no residual reactive ions such as Na or Cl remain. Note that organic solvents react with water to generate hydrogen chloride and other corrosive gases which can degrade device performance. (2) Washing devices with water will not cause any problems. However, make sure that no reactive ions such as sodium and chlorine are left as a residue. Also, be sure to dry devices sufficiently after washing. (3) Do not rub device markings with a brush or with your hand during cleaning or while the devices are still wet from the cleaning agent. Doing so can rub off the markings. (4) The dip cleaning, shower cleaning and steam cleaning processes all involve the chemical action of a solvent. Use only recommended solvents for these cleaning methods. When immersing devices in a solvent or steam bath, make sure that the temperature of the liquid is 50C or below, and that the circuit board is removed from the bath within one minute. (5) Ultrasonic cleaning should not be used with hermetically-sealed ceramic packages such as a leadless chip carrier (LCC), pin grid array (PGA) or charge-coupled device (CCD), because the bonding wires can become disconnected due to resonance during the cleaning process. Even if a device package allows ultrasonic cleaning, limit the duration of ultrasonic cleaning to as short a time as possible, since long hours of ultrasonic cleaning degrade the adhesion between the mold resin and the frame material. The following ultrasonic cleaning conditions are recommended: Frequency: 27 kHz 29 kHz Ultrasonic output power: 300 W or less (0.25 W/cm2 or less) Cleaning time: 30 seconds or less Suspend the circuit board in the solvent bath during ultrasonic cleaning in such a way that the ultrasonic vibrator does not come into direct contact with the circuit board or the device.
3-14
3 General Safety Precautions and Usage Considerations
3.5.5
No cleaning
If analog devices or high-speed devices are used without being cleaned, flux residues may cause minute amounts of leakage between pins. Similarly, dew condensation, which occurs in environments containing residual chlorine when power to the device is on, may cause betweenlead leakage or migration. Therefore, Toshiba recommends that these devices be cleaned. However, if the flux used contains only a small amount of halogen (0.05W% or less), the devices may be used without cleaning without any problems.
3.5.6
Mounting tape carrier packages (TCPs)
(1) When tape carrier packages (TCPs) are mounted, measures must be taken to prevent electrostatic breakdown of the devices. (2) If devices are being picked up from tape, or outer lead bonding (OLB) mounting is being carried out, consult the manufacturer of the insertion machine which is being used, in order to establish the optimum mounting conditions in advance and to avoid any possible hazards. (3) The base film, which is made of polyimide, is hard and thin. Be careful not to cut or scratch your hands or any objects while handling the tape. (4) When punching tape, try not to scatter broken pieces of tape too much. (5) Treat the extra film, reels and spacers left after punching as industrial waste, taking care not to destroy or pollute the environment. (6) Chips housed in tape carrier packages (TCPs) are bare chips and therefore have their reverse side exposed. To ensure that the chip will not be cracked during mounting, ensure that no mechanical shock is applied to the reverse side of the chip. Electrical contact may also cause a chip to fail. Therefore, when mounting devices, make sure that nothing comes into electrical contact with the reverse side of the chip. If your design requires connecting the reverse side of the chip to the circuit board, please consult Toshiba or a Toshiba distributor beforehand.
3.5.7
Mounting chips
Devices delivered in chip form tend to degrade or break under external forces much more easily than plastic-packaged devices. Therefore, caution is required when handling this type of device. (1) Mount devices in a properly prepared environment so that chip surfaces will not be exposed to polluted ambient air or other polluted substances. (2) When handling chips, be careful not to expose them to static electricity. In particular, measures must be taken to prevent static damage during the mounting of chips. With this in mind, Toshiba recommend mounting all peripheral parts first and then mounting chips last (after all other components have been mounted). (3) Make sure that PCBs (or any other kind of circuit board) on which chips are being mounted do not have any chemical residues on them (such as the chemicals which were used for etching the PCBs). (4) When mounting chips on a board, use the method of assembly that is most suitable for maintaining the appropriate electrical, thermal and mechanical properties of the semiconductor devices used. * For details of devices in chip form, refer to the relevant device's individual datasheets.
3-15
3 General Safety Precautions and Usage Considerations
3.5.8
Circuit board coating
When devices are to be used in equipment requiring a high degree of reliability or in extreme environments (where moisture, corrosive gas or dust is present), circuit boards may be coated for protection. However, before doing so, you must carefully consider the possible stress and contamination effects that may result and then choose the coating resin which results in the minimum level of stress to the device.
3.5.9
Heat sinks
(1) When attaching a heat sink to a device, be careful not to apply excessive force to the device in the process. (2) When attaching a device to a heat sink by fixing it at two or more locations, evenly tighten all the screws in stages (i.e. do not fully tighten one screw while the rest are still only loosely tightened). Finally, fully tighten all the screws up to the specified torque. (3) Drill holes for screws in the heat sink exactly as specified. Smooth the surface by removing burrs and protrusions or indentations which might interfere with the installation of any part of the device. (4) A coating of silicone compound can be applied between the heat sink and the device to improve heat conductivity. Be sure to apply the coating thinly and evenly; do not use too much. Also, be sure to use a non-volatile compound, as volatile compounds can crack after a time, causing the heat radiation properties of the heat sink to deteriorate. (5) If the device is housed in a plastic package, use caution when selecting the type of silicone compound to be applied between the heat sink and the device. With some types, the base oil separates and penetrates the plastic package, significantly reducing the useful life of the device. Two recommended silicone compounds in which base oil separation is not a problem are YG6260 from Toshiba Silicone. (6) Heat-sink-equipped devices can become very hot during operation. Do not touch them, or you may sustain a burn.
3.5.10
Tightening torque
(1) Make sure the screws are tightened with fastening torques not exceeding the torque values stipulated in individual datasheets and databooks for the devices used. (2) Do not allow a power screwdriver (electrical or air-driven) to touch devices.
3.5.11
Repeated device mounting and usage
Do not remount or re-use devices which fall into the categories listed below; these devices may cause significant problems relating to performance and reliability. (1) Devices which have been removed from the board after soldering (2) Devices which have been inserted in the wrong orientation or which have had reverse current applied (3) Devices which have undergone lead forming more than once
3-16
3 General Safety Precautions and Usage Considerations
3.6
3.6.1
Protecting Devices in the Field
Temperature
Semiconductor devices are generally more sensitive to temperature than are other electronic components. The various electrical characteristics of a semiconductor device are dependent on the ambient temperature at which the device is used. It is therefore necessary to understand the temperature characteristics of a device and to incorporate device derating into circuit design. Note also that if a device is used above its maximum temperature rating, device deterioration is more rapid and it will reach the end of its usable life sooner than expected.
3.6.2
Humidity
Resin-molded devices are sometimes improperly sealed. When these devices are used for an extended period of time in a high-humidity environment, moisture can penetrate into the device and cause chip degradation or malfunction. Furthermore, when devices are mounted on a regular printed circuit board, the impedance between wiring components can decrease under highhumidity conditions. In systems which require a high signal-source impedance, circuit board leakage or leakage between device lead pins can cause malfunctions. The application of a moisture-proof treatment to the device surface should be considered in this case. On the other hand, operation under low-humidity conditions can damage a device due to the occurrence of electrostatic discharge. Unless damp-proofing measures have been specifically taken, use devices only in environments with appropriate ambient moisture levels (i.e. within a relative humidity range of 40% to 60%).
3.6.3
Corrosive gases
Corrosive gases can cause chemical reactions in devices, degrading device characteristics. For example, sulphur-bearing corrosive gases emanating from rubber placed near a device (accompanied by condensation under high-humidity conditions) can corrode a device's leads. The resulting chemical reaction between leads forms foreign particles which can cause electrical leakage.
3.6.4
Radioactive and cosmic rays
Most industrial and consumer semiconductor devices are not designed with protection against radioactive and cosmic rays. Devices used in aerospace equipment or in radioactive environments must therefore be shielded.
3.6.5
Strong electrical and magnetic fields
Devices exposed to strong magnetic fields can undergo a polarization phenomenon in their plastic material, or within the chip, which gives rise to abnormal symptoms such as impedance changes or increased leakage current. Failures have been reported in LSIs mounted near malfunctioning deflection yokes in TV sets. In such cases the device's installation location must be changed or the device must be shielded against the electrical or magnetic field. Shielding against magnetism is especially necessary for devices used in an alternating magnetic field because of the electromotive forces generated in this type of environment.
3-17
3 General Safety Precautions and Usage Considerations
3.6.6
Interference from light (ultraviolet rays, sunlight, fluorescent lamps and incandescent lamps)
Light striking a semiconductor device generates electromotive force due to photoelectric effects. In some cases the device can malfunction. This is especially true for devices in which the internal chip is exposed. When designing circuits, make sure that devices are protected against incident light from external sources. This problem is not limited to optical semiconductors and EPROMs. All types of device can be affected by light.
3.6.7
Dust and oil
Just like corrosive gases, dust and oil can cause chemical reactions in devices, which will adversely affect a device's electrical characteristics. To avoid this problem, do not use devices in dusty or oily environments. This is especially important for optical devices because dust and oil can affect a device's optical characteristics as well as its physical integrity and the electrical performance factors mentioned above.
3.6.8
Fire
Semiconductor devices are combustible; they can emit smoke and catch fire if heated sufficiently. When this happens, some devices may generate poisonous gases. Devices should therefore never be used in close proximity to an open flame or a heat-generating body, or near flammable or combustible materials.
3.7
Disposal of devices and packing materials
When discarding unused devices and packing materials, follow all procedures specified by local regulations in order to protect the environment against contamination.
3-18
4
Precautions and Usage Considerations
4.
Precautions and Usage Considerations
This section describes matters specific to each product group which need to be taken into consideration when using devices. If the same item is described in Sections 3 and 4, the description in Section 4 takes precedence.
4.1
4.1.1
Microcontrollers
Design
(1) Using resonators which are not specifically recommended for use Resonators recommended for use with Toshiba products in microcontroller oscillator applications are listed in Toshiba databooks along with information about oscillation conditions. If you use a resonator not included in this list, please consult Toshiba or the resonator manufacturer concerning the suitability of the device for your application. (2) Undefined functions In some microcontrollers certain instruction code values do not constitute valid processor instructions. Also, it is possible that the values of bits in registers will become undefined. Take care in your applications not to use invalid instructions or to let register bit values become undefined.
4-1
4
Precautions and Usage Considerations
4-2
Chapter 1 Introduction
1. Introduction
This user's manual describes the C790 superscalar microprocessor for the system designer, paying special attention to the software interface and the bus interface. The C790 is a superscalar integrated implementation of the subset of the 64-bit MIPS IV Instruction Set Architecture. It also implements a large extension to this instruction set specially tailored for multimedia applications. It contains a CPU, a floating point execution unit (Coprocessor 1), primary instruction and data caches. Two instructions can be decoded each cycle. These instructions are issued in-order and are always completed in-order1. Data cache misses are non-blocking. A single outstanding cache miss does not stall the pipeline, so that load misses or uncached loads are retired out-of-order. Multiply, Multiply-Accumulate, Divide, Prefetch, and Coprocessor 1 instructions are also retired out-of-order.
1
However, some instructions are retired out-of-order.
1-1
Chapter 1 Introduction
1.1 Features
The C790 core has the following features: * 2-way superscalar pipeline * 128-bit (two 64-bit) data path and 128-bit system bus * Instruction set architecture * 64-bit MIPS III instruction set implementation (except LL, SC, LLD and SCD) * Selected MIPS IV instruction set implementation (Prefetch and Move conditional instructions) * Three-operand Multiply and Multiply-Accumulate instructions * 128-bit (Quadword) load/store instructions * 128-bit multimedia instructions which configure the 128-bit data path as two 64-bit, four 32-bit, eight 16-bit or sixteen 8-bit paths * Configurable Endianness * Branch prediction with Branch History Table (BHT) and Branch Target Address Cache (BTAC) * Large on-chip caches * Instruction cache: 32KB, 2-way set associative * Data cache: 32KB, 2-way set-associative (with write-back protocol) * Non-blocking load, hit under miss and early restart on first quadword * Data cache line locking * Prefetch functions * 64 Byte cache line * Fast integer Multiply and Multiply-Accumulate operations * Memory management unit * 48-entry (96 pages) fully associative translation look-aside buffer (TLB) * 32-bit physical address space and 32-bit virtual address space * IEEE754-1985 compatible FPU (MIPS III ISA supported) * Performance counters supported * Debug support * Multi-stepping of instruction execution * Hardware breakpoint on instruction addresses * Hardware breakpoint on data address and data value * PC tracing capability * 128-bit demultiplexed data bus and 32-bit address bus * Pipelined addresses * Bus error supported * Multiple masters supported
1-2
Chapter 1 Introduction
1.2 Related Documents
The following documents should be referenced: [1] MIPS R4000 Microprocessor User's Manual [2] MIPS R10000 Microprocessor User's Manual [3] MIPS IV Instruction Set (Revision 3.2)
1-3
Chapter 1 Introduction
1.3 Revision History
Rev. 1.0: June 24 , 1999 Rev. 1.1: December 25 , 1999 Add IEEE754 compatible FPU feature (both single- and double-precision) Rev. 1.2: March Publish Rev. 2.0: April Fixed a lot of typo , 2001 , 2000
th th
1-4
Chapter 1 Introduction
1.4 Conventions Used in This Manual
The names of registers, fields, and instructions are italicized as in this example: The Status register (SR) is a read/write register that contains the operating mode, interrupt enabling, and diagnostic states of the processor. When a name is first introduced, it is shown in bold type. Ranges are denoted by a colon as in the following example: The 4-bit Coprocessor Usability (CU[3:0]) field controls the usability of four possible coprocessors. Conventions used in instruction descriptions are defined at the beginning of Appendices A, B, C, and D.
1-5
Chapter 1 Introduction
1.5 Restrictions for Use of the C790 CPU Core
1. Revision History
Revision
1.0
Date
4/2/2001
Contents
FLX01-FLX06; Restrictions for User's Manual Rev.2.0
Items 1 through 6 in the description below are the restrictions that must be obeyed when using the C790 CPU core (User's Manual Rev.2.0).
Table 1-1. Restriction List ID
FLX01 FLX02 FLX03 FLX04 FLX05 FLX06
Contents
TLB exceptions masks bus errors. Bus errors are masked when Status.ERL==1 or Status.EXL = 1. AdEL occurs in index-type ICACHE or BTAC CACHE instructions. kuseg becomes an uncached area when an error exception (Status.ERL = 1) occurs. First two instructions in an exception handler are executed as NOP when a bus error occurs. Unexpected instruction-fetch bus-errors occur when executing a Crashme program.
1-6
Chapter 1 Introduction
2. Description
2.1 TLB exceptions mask bus errors (FLX01)
2.1.1 Phenomenon
There are cases in which TLB exceptions occurring immediately after a bus error mask the bus error and the bus error can not be detected.
2.1.2 Corrective measures
This is caused by bus error exceptions having a lower priority than TLB exceptions in instruction fetch and data access (refer to "5.5.1 Exception Priority"). Check the followings when programming a TLB exception handler. 1) Using the TLB exception handler, check for occurrence of any bus error exceptions before a page refill. 2) Using the TLB exception handler, check for occurrence of any bus error exceptions if a page that should be refilled is incorrect. 3) Using the TLB exception handler, execute at Status.EXL==0 and
Status.ERL==0 after the TLB exception handler stores to EPC, Cause, and Status registers. Pending bus errors can be confirmed by referring to Status.BEM.
1-7
Chapter 1 Introduction
2.2 Bus errors are masked when Status.ERL==1 or Status.EXL = 1 (FLX02)
2.2.1 Phenomenon
Even if a bus error occurs during instruction fetch in an exception handler (Status.EXL==1 or Status.ERL==1), the CPU does not accept the exception and executes instruction code with indeterminate values read from the bus.
2.2.2 Corrective measures
This is caused by bus error exceptions being masked by Status.EXL==1 or Status.ERL==1. Do not cause exceptions due to instruction fetch in Status.EXL==1 or Status.ERL==1. Generating exceptions in an exception handler is dangerous. For example: 1) The JR instruction may potentially cause an address error or a bus error. Do not use JR instruction in Status.EXL==1 or Status.ERL==1. 2) A mapped region may potentially cause a TLB exception. Be sure to execute using an unmapped region like that below: 0x8000_0000 - 0x9FFF_FFFF: 0xA000_0000 - 0xBFFF_FFFF: kseg0 kseg1
1-8
Chapter 1 Introduction
2.3 AdEL occurs in index-type ICACHE or BTAC CACHE instructions (FLX03)
2.3.1 Phenomenon
When executing index-type CACHE instructions below in either the User mode or Supervisor mode, operation occasionally becomes undefined and generates AdEL (Address Error exception; load and inst fetch). There are five index-type ICACHE sub operations as listed below. 00111 00000 00100 00001 00101 CACHE IXIN CACHE IXLTG CACHE IXSTG CACHE IXLDT CACHE IXSDT I$ index invalidate I$ index load tag I$ index store tag I$ index load data I$ index store data
There are four BTAC CACHE sub operations as listed below. 00010 00110 01100 01010 CACHE BXLBT CACHE BXSBT CACHE BHINBT index load BTAC index store BTAC hit invalidate BTAC
CACHE BFH BTAC flush
However, there is no problem when Status.KSU==Kernel. Please note that Status.KSU==Kernel includes the kernel mode at Status.EXL==1 or Status.ERL==1 as well. There is also no problem when Status.CU[0]==0, and Status.KSU==User mode or Supervisor mode.
2.3.2 Corrective measures
In Status.CU[0]==1 and Status.KSU==Supervisor or User, execute under VA[31]==0 when executing either index-type ICACHE or BTAC CACHE instructions. VA here represents base reg + offset.
1-9
Chapter 1 Introduction
2.4 kuseg becomes an uncached (Status.ERL = 1) occurs (FLX04)
2.4.1 Phenomenon
area
when
an
error
exception
There are cases in which kuseg (0x0000_0000 - 0x7FFF_FFFF) becomes uncached in an error exception handler (Status.ERL==1) and data consistency with cached area (kseg, ksseg, kseg0) is lost.
2.4.2 Corrective measures
In an error exception handler (Status.ERL==1), when accessing kuseg (0x0000_0000 - 0x7FFF_FFFF), access it after guarding using SYNC.L as follows: SYNC.L SW kuseg
1-10
Chapter 1 Introduction
2.5 First two instructions in an exception handler are executed as NOP when a bus error occurs (FLX05)
2.5.1 Phenomenon
There are cases in which the first two instructions in an exception handler are executed as NOP instructions, when certain exception occurs and then a bus error occurs immediately before jumping to the exception handler.
2.5.2 Corrective measures
Place NOP in the first two instruction locations in all exception handlers.
1-11
Chapter 1 Introduction
2.6 Unexpected instruction-fetch bus-errors occur when executing a Crashme program (FLX06)
2.6.1 Phenomenon
In Kernerl mode or Supervisor mode, unexpected Instruction-fetch bus errors occur when attempting to execute a program called "Crashme" of Linux, since prohibited instruction-sequences that do not obey the following programming restrictions are executed. In User mode, such a phenomenon doesn't occur.
2.6.2 Corrective measures
In Kernerl mode or Supervisor mode , obey the following programming restrictions: 1) Any CACHE instruction must not be placed in a branch delay slot. 2) SYNC.P must be located immediately before or immediately after any CACHE instruction.
1-12
Chapter 2 Architecture Overview
2.
Architecture Overview
This chapter includes an overview of the C790 architecture. It discusses the following items: * * * * * * * * * * Block diagram and main modules Superscalar pipeline operation Instruction set Registers Memory Management Cache Memory Bus interface Floating Point Unit Performance Monitors Debug Support
2-1
Chapter 2 Architecture Overview
2.1 Block Diagram and Functional Block Descriptions
This section presents a block diagram of the main modules of the C790 and summarizes the modules.
2.1.1 PC Unit PC Pipe & BTAC (64-entry fully assoc.) 2.1.2 ITLB 2 entries Instruction Physical Address (IPA) Instruction Virtual Address (IVA) 2.1.3 Instruction Cache (I-Cache) Tag, BHT, Predecode, Inst RAMs (32 KB, 2-way set assoc.) 2.1.4 I-Cache Output Pipeline Control
2.1.2 MMU TLB Refill Bus 48 entry TLB Cop0 Registers
Issue Logical Staging Resigters (2 Issue In-order) 2.1.5 GPR (32x128-bit wide registers) 2.1.7 Operand/Bypass Logic 128b 2.1.5
LS Execution Pipe
Virtual Address Computation Logic Data Virtual Address (DVA) 2.1.3 Data Cache (D-Cache) (32 KB, 2-way set assoc.) 2.1.2 2.1.6
FPR (32x64-bit wide registers)
Data Physical Address (DPA)
C1 COP1 (FPU) Pipe
BR Execution Pipe
DTLB (4 entries)
I1 Execution Pipe
I0 Execution Pipe
2.1.9 Response Buffer 2.1.8 WBB UCAB
Result and Move Buses
2.1.10
128b
BIU Bus 2.1.11 Bus Interface Unit 128b CPU Bus 128b
Figure 2-1. C790 Block Diagram
2-2
Chapter 2 Architecture Overview
2.1.1
PC Unit
The 32-bit Program Counter (PC) holds the address of the instruction which is being executed. It also contains a 64-entry Branch Target Address Cache (BTAC) which stores branch target addresses used during branch prediction.
2.1.2
MMU
The Memory Management Unit supports the address translation functions of the CPU. It supplies the DTLB (Data Translation Lookaside Buffer) and ITLB (Instruction Translation Lookaside Buffer) with data via the TLB Refill Bus. Usage of these buffers is described in chapter 6.
2.1.3
Caches
Operation of the Instruction Cache and the Data Cache is described in Chapter 7. For each branch instruction, present in the instruction cache, two bits of branch history are stored in the Branch History Table (BHT).
2.1.4
Issue Logic and Staging Registers
The issue logic decides how to route instructions to appropriate pipes. It issues up to 2 instructions every cycle. Routing is described and discussed later in section 2.2.
2.1.5
GPR (General Purpose Registers) and FPR (Floating-Point Registers)
The General-Purpose Registers and the Floating-Point Registers are discussed in Section 2.3.
2.1.6
2.1.6.1
The Five Execution Pipes
I0 and I1 Pipes
There are two integer ALU pipelines (I0 and I1), each of which contains a complete 64-bit ALU, Shifter and Multiply-Accumulate unit. The I0 pipeline contains the SA register used for funnel shift operations. The two 64-bit ALU pipelines can be configured dynamically (on an instruction-by-instruction basis) into a single 128-bit execution pipeline to execute 128-bit Multimedia ALU, Shift and Multiply-Accumulate instructions. Furthermore, the two ALU pipelines share a single 128-bit multimedia aligner.
2.1.6.2
LS - Load/Store Pipe
The Load/Store (LS) pipe contains logic to support a single 128-bit Load and Store instruction.
2.1.6.3
BR - Branch Pipe
The Branch (BR) pipe contains logic to implement a single Branch instruction including Branch comparators.
2.1.6.4
C1 - COP1/FPU Pipe
The C1 pipe contains logic to support a single/double Floating Point coprocessor unit (COP1). 2-3
Chapter 2 Architecture Overview
2.1.7
Operand/Bypass logic
This module takes data from the GPRs and from the Result and Move Buses, and routes the data to the pipelines.
2.1.8
Response Buffer and Writeback Buffer
The Writeback Buffer (WBB) is an 8 entry by 16 byte (one quadword) FIFO queuing up stores prior to accessing the CPU bus. It increases C790 performance by decoupling the processor from the latencies of the CPU bus. It is also used during the gathering operation of uncached accelerated stores; sequential stores less than a quadword in length are gathered in the WBB, thereby reducing bus bandwidth usage.
2.1.9
UCAB
The Uncached Accelerated Buffer (UCAB) is a 1 entry by 8 quadword buffer. It caches 128 sequential bytes of data during an uncached accelerated load miss. Subsequent loads from the uncached accelerated address space get their data from this buffer if the address hits in the UCAB, thereby eliminating bus latencies and providing higher performance.
2.1.10 Result and Move Buses
The Result and Move Buses convey data between execution units, the data cache, and the Operand/Bypass Logic unit.
2.1.11 Bus Interface Unit and BIU Bus
The BIU connects the core to the rest of the system. It interfaces the core's internal bus signals to the CPU Bus.
2-4
Chapter 2 Architecture Overview
2.2 Superscalar Pipeline Operation
The C790 has a six-stage superscalar pipeline. It can fetch, decode and execute a maximum of two instructions in parallel each cycle. This section discusses in more detail the six execution pipelines listed in Section 2.1. It also discusses how instructions are routed among pipes.
2.2.1
Integer Instruction Pipeline Stages
The C790 contains four integer pipelines: the I0 and the I1 pipes, and the Load/Store and Branch pipes. Each pipe consists of the following six stages with each stage having 2 phases: * * * * * * I: Instruction Address Select Q: Instruction Queue R: Register Fetch A: Execution D: Data Fetch W: Write-back
Figure 2-2 shows the six stages of an integer instruction pipeline
I I Q Q I I R R Q Q I I A A R R Q Q I I D D A A R R Q Q I I W W D D A A R R Q Q I I
W W D D A A R R Q Q
W W D D A A R R
W W D D A A
W W D D
W W
Current CPU Cycle
Figure 2-2. C790 Integer Instruction Pipeline
2-5
Chapter 2 Architecture Overview I: Instruction Address Select During the I stage, the following occurs: * * * The sequential address is calculated The branch address is calculated The instruction address is selected from the following sources * Sequential address * Actual Branch / Jump address * Predicted Branch Target address from the BTAC * Exception vector address * EPC and Error PC
Q: Instruction Queue During the Q stage, the following occurs: * * * * * * * The instruction translation look-aside buffer (ITLB) does the virtual-to-physical address translation The instruction cache (data, Tag, steering bits & BHT) fetch begins TLB read for instruction fetch starts The instruction cache fetch is completed TLB read for instruction fetch completes The instruction cache Tag hit check is determined and the way selection is done The appropriate instructions are selected by the steering bits
R: Register Fetch During the R stage the following occurs: * * * * Instructions are bussed to the appropriate execution units Register file is read Execution unit structural hazards are determined Instructions are decoded, data dependencies are determined and the appropriate instructions are issued
A: Execution During the A stage, the following occurs: * * * * * * * Results from the D or W stages are bypassed The execution units start and complete the integer arithmetic, logical, shift and multimedia instructions The iterative steps of the Multiply, Multiply-Accumulate, or Divide instructions are executed The virtual address for load and store instructions is calculated The branch condition is determined The DTLB is read The Data Cache and UCAB read starts
2-6
Chapter 2 Architecture Overview D: Data Fetch During the D stage, the following occurs: * * * * * * * * * The TLB read for a data access The Data Cache and UCAB read is completed The Data Cache Tag checking is completed Load or register data is obtained from COP1 (FPU) COP0 registers are read Data alignment and way selection is done for the data from the Data Cache Data sign extension is done Complete updating BHT bits and the BTAC All the exceptions are detected
W: Write Back During the W stage, the following occurs: * * * * For store operations data is written to the Data Cache Data for coprocessor data transfer instructions is transferred to COP1 (FPU) For register-to-register and load instructions, the result is written to the register file COP0, COP1 (FPU) registers are written for coprocessor data transfer instructions
2-7
Chapter 2 Architecture Overview
2.2.2
C1 (COP1/FPU) Instruction Pipeline Stages
* * * * * * * * I: Instruction Address Select Q: Instruction Queue R: Register Fetch T: COP1 Register Fetch X: FP Execution 1st Stage Y: FP Execution 2nd Stage Z: FP Execution 3rd Stage S: Register File Write Stage
The C790's C1 (COP1/FPU) pipeline consists of the following eight stages:
The eight stages of the pipeline for COP1/FPU are shown in Figure 2-3 with some pipeline stages identified with two letters. COP1 instructions execute simultaneously in the main integer pipeline I0 and the coprocessor 1 pipeline. The first letter identifies the main integer pipeline stage and the second letter identifies the coprocessor pipeline stage.
I
Q I
R Q I
A/T D/X W/Y R Q I R Q I
Z
S Z S Z S Z S Z S Z S Z S Z S
A/T D/X W/Y R Q I
A/T D/X W/Y R Q I
A/T D/X W/Y R Q I
A/T D/X W/Y R Q
A/T D/X W/Y R
A/T D/X W/Y
A/T D/X W/Y
Current CPU Cycle
Figure 2-3. FPU Pipeline
The I, Q, and R stages were previously described in Section 2.2.1. The following describes stages specific to the COP1 pipeline: T: COP1 Register Fetch During the T stage, the following occurs: * * Register file read for operands Bypass muxes from the S Stage/W Stage for S/T overlap.
2-8
Chapter 2 Architecture Overview X: FP Execution 1st Stage This stage is the first step for floating point operations. During the X stage, the following occurs: * * * Detect Exceptions for input data. Detect Exception possibilities for result. The Booth function/Wallace multiplication is performed for multiply, the denor-malization is performed for add/subtract.
Y: FP Execution 2nd Stage This stage is the second step for floating point operations. The following occurs: * * * * Test overflow/underflow on exponent is done Normalization for multiplication is done. Add/subtract the significand for add/subtract operations. Count leading zeros, to determine the shift amount for the normalization
Z: FP Execution 3rd Stage This stage is the third step for floating point operations. The following occurs: * * * * * Overflow/underflow detection Exponent readjustment Shift the significand for normalization Round the result Detect inexact exception
S: Register File Write Stage During the S stage, the following occurs: * * * FPR registers are written. FCSR31 is updated. Bypass values are passed to the T stage.
2-9
Chapter 2 Architecture Overview
2.2.3
Classification and Routing of Instructions According to Execution Pipelines
This section discusses how the five execution pipelines are used in conjunction with instruction routing. Figure 2-4 identifies the specific execution pipelines into which instructions of a particular class are routed, and shows which physical execution units handle instructions from a particular logical pipe. Instruction categories are identified in italics, and are shown within the physical pipes where they are executed. ALU instructions can be executed in either integer pipe I0 or I1. COP1 Operate, and COP1 Move instructions execute in two pipes as shown, as does the Wide Operate.
Logical Pipe0
Logical Pipe1
I0 pipe ALU SA Operate MAC0
I1 pipe ALU SYNC ERET COP0 MAC1
LS pipe Load/ Store Prefetch CACHE
Wide Operate
Physical Pipes
BR pipe Branch
C1 Compute COP1 Operate
C1 Move
COP1 Move
Figure 2-4. Instruction Routing in Logical Pipes and Physical Pipes
2-10
Chapter 2 Architecture Overview Table 2-1 shows the categories of instructions and the execution pipelines that can execute those instructions. The instructions in a single category have the same issuing policy. Instructions which require more than a single execution pipeline are identified in the pipeline column with the (&) symbol. For example, COP1 Move requires both the LS and the C1 execution pipelines. On the other hand, the ALU instructions can be executed in either the I0 or the I1 execution pipelines.
Table 2-1. Categories of Instructions and How They Are Routed Categories I0
Load/Store SYNC ERET SA Operate COP0 COP1 Move1 COP1 Operate2 ALU3 MAC0 & &
Execution Pipeline I1 LS BR
Instructions C1
Load, Store, Wide Load , Wide Store, Prefetch, CACHE Synchronization Exception return Move to/from to SA register COP0 Coprocessor move, COP0 Coprocessor operations COP1 Coprocessor move, COP1 Coprocessor Load/Store COP1 Operate Instructions Arithmetic, Shift, Logical, Trap, SYSCALL, BREAK Multiply and Multiply -Accumulate for HI/LO register, MFHI/LO, MTHI/LO Multiply and MultiplyAccumulate for HI1/LO1 register, MFHI1/LO1, MTHI1/LO1 Branch, Jump, Jump/Link, All Coprocessor Branches Wide ALU, Wide shift, Wide MAC, Funnel shift, Wide HI/LO Moves
MAC1
Branch Wide Operate4 &
1 2
COP1 Move instructions execute concurrently in the LS and the C1 pipes. COP1 Operate instructions execute concurrently in the I0 and the C1 pipes. 3 ALU instructions can be executed in either the I0 or the I1 pipes. 4 Wide Operate instructions execute concurrently in the I0 and the I1 pipes.
2-11
Chapter 2 Architecture Overview
2.2.4
Instruction Issue Combinations
The C790 always fetches two instructions. A pair of staging registers acts as a `bellows' between the Q and the R stage. If an instruction can't be issued in a particular cycle, it is saved in the staging registers. In the next cycle the C790 again fetches two instructions and tries to issue two (the one left over in the staging register from the previous cycle and the next sequential one from the pair that is fetched). So the C790 always tries to issue two instructions each cycle whenever it can. The two instructions that get issued go to the R-stage of the pipeline and get associated with one of two logical pipes: Pipe0 and Pipe1. The instructions are then routed to an appropriate physical pipe for processing. Instruction categories that can get issued to logical Pipe0 are: 1. 2. 3. 4. 5. 6. ALU Branch Wide Operate SA Operate MAC0 COP1 Operate
An alternate way to view this is to recognize that logical Pipe0 is made up of the I0, C1 and BR execution pipelines. When issuing Wide Operate instructions logical Pipe0 also uses the I1 execution pipeline. Instruction categories that can get issued to logical Pipe1 are: 1. 2. 3. 4. 5. 6. 7. 8. ALU Branch SYNC ERET Load/Store COP1 Move COP0 MAC1
An alternate way to view this is to recognize that logical Pipe1 is made up of the I1, LS, C1 and BR execution pipelines. All instruction categories are statically bound to a single logical pipe, that is, they can only be issued to a particular logical pipe. However the ALU and Branch instruction categories can get issued to either of the two logical pipes. Thus the binding of these two instruction categories to a particular logical pipe is done at instruction issue time. There are some special cases of instruction sequences that are not allowed in the MIPS ISA. An instruction from the Branch category is not allowed to have another instruction from either the Branch or ERET category in its branch delay slot. So the following pairs of instructions are illegal and effectively never issued together: 1. 2. Branch - Branch Branch - ERET
2-12
Chapter 2 Architecture Overview The following sequences of instructions are also not allowed in the C790. Branch-Likely instructions are a subset of the Branch category (limited to the branch likely instructions). 1. 2. 3. 4. 5. 6. 7. 8. 9. Branch - SYNC.P Branch - SYNC.L Branch - CACHE *1 Branch-Likely - MTSA Branch-Likely - MTSAB Branch-Likely - MTSAH Branch-Likely - TLBR *2 Branch-Likely - TLBWI *2 Branch-Likely - TLBWR *2
*1 CACHE instruction must be guarded by Sync instructions. Sync.P Sync.L CACHE I$ or CACHE D$ Sync.P Sync.L *2 TLBR, TLBWI, TLBWR instructions must be followed by Sync.P TLBxx Sync.P
The following table shows the instruction categories which can be issued concurrently to the two logical pipes. All combinations are legal except the ones marked with an "X". The combinations marked with a "Y" can be issued concurrently, i.e., enter the R stage together but then the younger instruction stalls in the A stage for a single cycle in order to avoid a resource hazard.
Table 2-2. Concurrently Issued Instruction Categories LOGICAL PIPE0 SA Oper. Load/Store LOGICAL PIPE1 ERET SYNC LZC COP1 Move ALU MAC1 Branch COP0
X: illegal combination Y: Can be issued concurrently but it will stall due to structure hazard.
X Y Y Y X
COP1 Oper.
ALU
MAC0
Branch
Wide Oper.
2-13
Chapter 2 Architecture Overview
2.3 Registers
The C790 extends the normal MIPS compatible register set by extending the general GPRs) purpose registers (GPR from 64-bits to 128-bits, adding an additional pair of HI/LO GPR registers for the I1 pipe and adding the SA register for the funnel shift instruction.
2.3.1
CPU Registers
The C790 has 128-bit wide GPRs. The upper 64 bits of the GPRs are only used by the C790-specific "Quad Load/Store", and "Multimedia (Parallel)" instructions. The HI1 and LO1, which are the upper 64 bits of each of the 128-bit HI and LO registers, are also used by new multiply and divide instructions, such as MULT1, MULTU1, DIV1, DIVU1, MADD1, MADDU1, MFHI1, MFLO1, MTHI1, and MTLO1, which are nonparallel I1 pipeline-specific instructions. The SA register contains the shift amount used by the 256 bit funnel shift instruction.
2.3.2
FPU Registers
The floating point unit (COP1) has 64-bit wide floating point registers. It also contains 2 floating point control registers .
2-14
Chapter 2 Architecture Overview
2.3.3
COP0 Registers
Table 2-3 identifies the COP0 registers of the C790.
Table 2-3. Coprocessor 0 Registers Register No.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Register Name
Index Random EntryLo0 EntryLo1 Context PageMask Wired (Reserved) BadVAddr Count EntryHi Compare Status Cause EPC PRId Config (Reserved) (Reserved) (Reserved) (Reserved) (Reserved) (Reserved) BadPAddr Debug Perf (Reserved) (Reserved) TagLo TagHi ErrorPC (Reserved)
Description
Purpose
Programmable register to select TLB entry for reading or writing Pseudo-random counter for TLB replacement Low half of TLB entry for even PFN (Physical page number) Low half of TLB entry for odd PFN (Physical page number) Pointer to kernel virtual PTE table Mask that sets the TLB page size Number of wired TLB entries Undefined Bad virtual address Timer compare High half of TLB entry(Virtual page number and ASID) Timer compare Processor Status Register Cause of the last exception taken Exception Program Counter Processor Revision Identifier Configuration Register Undefined Undefined Undefined Undefined Undefined Undefined Bad Physical Address This is used for Debug function Performance Counter and Control Register Undefined Undefined Cache Tag register(low bits) Cache Tag register(high bits) Error Exception Program Counter Undefined
MMU MMU MMU MMU Exception MMU MMU Undefined Exception Exception MMU Exception Exception Exception Exception MMU MMU Undefined Undefined Undefined Undefined Undefined Undefined Exception Debug Exception Undefined Undefined MMU MMU Exception Undefined
2-15
Chapter 2 Architecture Overview
2.4 Memory Management
The C790 processor provides a memory management unit (MMU) which uses an on-chip translation look-aside buffer (TLB) to translate virtual addresses into physical addresses. The C790 supports the MIPS compatible 32-bit address and 64-bit data mode. Only 32-bit virtual and physical addresses have been implemented. There is no requirement for address sign extension. Address error exception checking will not be done on the "upper" 32-bits (which are ignored). The only condition that will generate the address error exception will be address alignment errors and segment protection errors. In Kernel mode, it is free from address error exception for program counter to wrap-around from kseg3 to kuseg. Since there is only one addressing mode, all the four MIPS ISAs (I, II, III, IV) and the C790 specific ISA are available without any restrictions in all of the three processor modes (with the appropriate MIPS ISA coprocessor usable restrictions). As such the reserved instruction (RI) exception will occur only when the processor really tries to execute an undefined opcode. Features * * * * * * * MIPS III-compatible 32-bit MMU Operating Modes: User, Supervisor, and Kernel TLB: 48 entries of even/odd page pairs (96 pages) Fully associative Page Size: 4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB, 16 MB ITLB: 2 entries DTLB: 4 entries Address Sizes: Virtual Address Size = 32 bit, 2 Gbyte per user Process Physical Address Size = 32 bit, 4 Gbyte
2-16
Chapter 2 Architecture Overview
2.5 Cache Memory
The C790 core contains both an instruction cache and a separate data cache. Features The following are the main features of the caches: * * * * * * * * * * * * * * * * * * * Separate Instruction Cache and Data Cache Virtually indexed and physically tagged caches Write-back policy for the Data Cache Data Cache and Instruction Cache burst read sequential ordering Cache Size: Instruction Cache: 32 KB Data Cache: 32 KB Line Size: 64 Bytes Refill size: 64 Bytes Associativity: 2-way set-associative Write Policy: Write-back and write allocate Data order for block reads: Sequential ordering Data order for block writes: Sequential ordering Instruction cache miss restart: After all data received Data cache miss restart: Early restart on first quadword Cache parity: No Cache Locking: Data Cache Line Lock. Controlled by CACHE instruction Cache Snooping: No Non-blocking load: Yes Hit Under Miss: Yes (Multiple hits under one miss are supported) Data Cache Prefetch: Yes
2-17
Chapter 2 Architecture Overview
2.6 Bus Interface
The C790 CPU core is connected to the rest of the system, and to external devices, through the group of on-chip C790 system bus signals called the CPU Bus. Features * * * * * * * * Separate data and address buses (Demultiplexed operation) 128-bit data bus Clocked synchronous operations Peak transfer rate of 2.1 GB/sec (@133 MHz bus clock) 8/16/32/64/128-bit and burst accesses Multimaster capability Pipelined operations No turn-around or dead cycles between transfers
The CPU Bus does not provide: * Cache coherency support * Split transactions
2.7 Floating Point Unit
The floating point unit is IEEE754-1985 compatible as same as FPU in the TX49HF CPU core. Features: Main Features * * * * * Tightly coupled to the C790 Integer pipeline. Supports both double and single precision format as defined in IEEE-754 specification No hardware support for Denormalized number in the IEEE-754 specification. Software (exception handler) supports it. The FPU supports five IEEE exceptions and one MIPS defined exception. ADD, SUB, MUL, DIV, ABS, MOV, NEG, SQRT, compare and convert are supported
2-18
Chapter 2 Architecture Overview
2.8 Performance Counter
The performance counter provides the means for gathering statistical information about the internal events of the CPU and the pipeline during program execution. The statistics gathered during program execution aid in tuning the performance of hardware and software systems based on the processor. The performance counter consists of one control register and two counters. The control register controls the functions of the performance counter while the counters count the number of events specified by the control register. Features: * * * Two performance counter registers Over twenty different events within the processor can be counted Counting can be selectively enabled in User, Supervisor, Kernel, and Exception modes
2.9 Debug and Tracing Functions
The C790 supports real-time PC tracing. Pipeline status, target addresses of indirect jumps, and exception vectors are made available on special signals. The executed instruction sequence can be restored from signals and the source program. Features: * * * * * * * * * * One Instruction Address Breakpoint register One Instruction Address Breakpoint Mask register One Data Address Breakpoint register One Data Address Breakpoint Mask register One Data Value Breakpoint register One Data Value Breakpoint Mask register Each breakpoint individually enabled Breakpoint function can be selectively enabled in User, Supervisor, Kernel, and Exception modes External Trigger signal can be generated when breakpoint occurs 11 signals used to provide real-time PC tracing function
2-19
Chapter 2 Architecture Overview
2-20
Chapter 3 Instruction Set Overview and Summary
3. Instruction Set Overview and Summary
This chapter provides an overview of the C790 instruction set. Refer to Appendices A - D for detailed descriptions of individual instructions.
3-1
Chapter 3 Instruction Set Overview and Summary
3.1 Introduction
The C790 supports all MIPS III instructions with the exception of 64-bit multiply, 64-bit divide, Load Linked and Store Conditional instructions. It also supports a limited number of MIPS IV instructions and additional C790-specific instructions, such as Multiply/Add instructions and multimedia instructions. The instruction set can be divided into the following groups: * Load and Store * Computational * Jump and Branch * Miscellaneous * System Control Coprocessor (COP0) * Coprocessor 1 (COP1) * C790-specific
3-2
Chapter 3 Instruction Set Overview and Summary
3.2 CPU Instruction Set Formats
There are three instruction formats: immediate (I-type), jump (J-type), and register (Rtype), as shown in Figure 3-1. The use of a small number of instruction formats simplifies instruction decoding (thus producing higher frequency operations) and allows the compiler to synthesize more complicated (and less frequently used) operations and address modes from these three formats as needed.
I-type (Immediate)
31 op 26 25 rs 21 20 rt 16 15 immediate 0
J-type (Jump)
31 op 26 25 target 0
R-type (Register)
31 op 26 25 rs 21 20 rt 16 15 rd 11 10 sa 65 funct 0
op rs rt immediate target rd sa funct
6-bit operation code 5-bit source register specifier 5-bit target (source/destination) register or branch condition 16-bit immediate value, branch displacement or address displacement 26-bit jump target address 5-bit destination register specifier 5-bit shift amount 6-bit function field
Figure 3-1. CPU Instruction Formats
3-3
Chapter 3 Instruction Set Overview and Summary
3.3 Instruction Set Summary
The C790 supports MIPS III instructions1 as well as a limited number of MIPS IV instructions. A large number of C790-specific instructions, such as multiply/add instructions and multimedia instructions have also been implemented.
3.3.1
Load/Store Instructions
The instructions in this group transfer data of different sizes: bytes, halfwords, words, doublewords and quadwords. Signed and unsigned integers of different sizes are supported by loads that either sign-extended or zero-extended the data loaded into the register. Load and store instructions are immediate (I-type) instructions that move data between memory and the general registers. The only addressing mode that load and store instructions directly support is base register plus 16-bit signed immediate offset. 3.3.1.1 Normal Loads and Stores
The C790 does not support Load Linked and Store Conditional instructions, LL, LLD, SC and SCD. For details of these instructions refer to Appendix A.
Table 3-1. Load / Store Instructions Mnemonic
LB LBU LD LDL LDR LH LHU LW LWL LWR LWU SB SD SDL SDR SH SW SWL SWR
Description
Load Byte Load Byte Unsigned Load Doubleword Load Doubleword Left Load Doubleword Right Load Halfword Load Halfword Unsigned Load Word Load Word Left Load Word Right Load Word Unsigned Store Byte Store Doubleword Store Doubleword Left Store Doubleword Right Store Halfword Store Word Store Word Left Store Word Right
Defined in
MIPS I MIPS I MIPS III MIPS III MIPS III MIPS I MIPS I MIPS I MIPS I MIPS I MIPS III MIPS I MIPS III MIPS III MIPS III MIPS I MIPS I MIPS I MIPS I
1
Note: The C790 does not support the following MIPS III instructions: 64-bit multiply and divide instructions (DMULT, DMULTU, DDIV, DDIVU) Semaphore instructions (LL, LLD, SC, SCD)
3-4
Chapter 3 Instruction Set Overview and Summary
3.3.1.2
Multimedia Loads and Stores
The C790 implements 128-bit (quadword) load and store instructions for multimedia purpose. For details of these instructions refer to Appendix B.
Table 3-2. Multimedia Load / Store Instructions Mnemonic
LQ SQ
Description
Load Quadword Store Quadword
Defined in
C790 C790
3.3.1.3
Coprocessor Loads and Stores
These loads and stores are coprocessor instructions. A particular coprocessor is enabled if corresponding CU bit is set in CP0 Status register. Otherwise executing one of these instructions generates a Coprocessor Unusable exception. For details of these instructions refer to Appendices C and D.
Table 3-3. Coprocessor Load / Store Instructions Mnemonic
LDC1 LWC1 SDC1 SWC1
Description
Load Doubleword to Floating Point Load Word to Floating Point Store Doubleword from Floating Point Store Word from Floating Point
Defined in
MIPS II MIPS I MIPS II MIPS I
3.3.1.4
Data Formats and Addressing * * * * *
The C790 processor uses five data formats: 128-bit quadword 64-bit doubleword 32-bit word 16-bit halfword 8-bit byte
Byte ordering within each of the larger data formats -- halfword, word, doubleword -- can be configured in either big-endian or little-endian order. Endianness refers to the location of byte 0 within the multi-byte data structure. Figure 3-2 and Figure 3-3 show the ordering of bytes within words and the ordering of words within multiple-word structures for the big-endian and little-endian conventions. When the C790 processor is configured as a big-endian system, byte 0 is the mostsignificant (leftmost) byte, thereby providing compatibility with MC 68000(R) and IBM 370(R) conventions. Figure 3-2 shows this configuration.
3-5
Chapter 3 Instruction Set Overview and Summary
Bit # Higher Address Word Address 31 12 8 4 0 12 8 4 0 24 23 13 9 5 1 16 15 14 10 6 2 87 15 11 7 3 0
Lower Address
Figure 3-2. Big-Endian Byte Ordering
When configured as a little-endian system, byte 0 is always the least-significant (rightmost) byte, which is compatible with iAPX(R) x86 and DEC VAX(R) conventions.
Bit # Higher Address Word Address 31 12 8 4 0 15 11 7 3 24 23 14 10 6 2 16 15 13 9 5 1 87 12 8 4 0 0
Lower Address
Figure 3-3. Little-Endian Byte Ordering
In this text, bit 0 is always the least-significant (rightmost) bit: thus, bit designations are always little-endian (although no instructions explicitly designate bit positions within words).
3-6
Chapter 3 Instruction Set Overview and Summary Figure 3-4 and Figure 3-5 show little-endian and big-endian byte ordering in doublewords.
Most-significant byte
Least-significant byte Least significant Word 40 39 5 4 32 31 3 24 23 2 16 15 1 87 0 0
Bit # Byte #
63 7
56 55 6
48 47
Halfword
Byte
Bit # 7 6 5 4 3 2 1 0 Bits in a Byte
Figure 3-4. Little-Endian Data in a Doubleword
Most-significant byte
Least-significant byte Least significant Word 40 39 2 3 32 31 4 24 23 5 16 15 6 87 7 0
Bit # Byte #
63 0
56 55 1
48 47
Halfword
Byte
Bit #
76 543 21 0 Bits in a Byte
Figure 3-5. Big-Endian Data in a Doubleword
3-7
Chapter 3 Instruction Set Overview and Summary The CPU uses byte addressing for halfword, word, doubleword, and quadword accesses with the following alignment constraints: * * * * Halfword accesses must be aligned on an even byte boundary (0, 2, 4...). Word accesses must be aligned on a byte boundary divisible by four (0, 4, 8...). Doubleword accesses must be aligned on a byte boundary divisible by eight (0, 8, 16...). Quadword accesses must be aligned on a byte boundary divisible by sixteen (0, 16, 32...).
The following special instructions load and store words that are not aligned on 4-byte (word), 8-byte (doubleword), boundaries: LWL LDL LWR LDR SWL SDL SWR SDR
These instructions are used in pairs to provide addressing of misaligned words. Addressing misaligned data incurs one additional instruction cycle over that required for addressing aligned data. This extra cycle is because of an extra instruction for the "pair" (e.g.,LWL and LWR form a pair). Also note that the CPU moves the unaligned data at the same rate as a hardware mechanism. Figure 3-6 and Figure 3-7 shows the access of a misaligned word that has byte address 3.
Bit # Higher Address 31 4 Lower Address Figure 3-6. Big-Endian Misaligned Word Addressing Bit # Higher Address 31 3 Lower Address Figure 3-7. Little-Endian Misaligned Word Addressing 24 23 6 16 15 5 87 4 0 24 23 5 16 15 6 3 87 0
3-8
Chapter 3 Instruction Set Overview and Summary
3.3.1.5
Defining Access Types
Access type indicates the size of the C790 processor data item to be loaded or stored, set by the load or store instruction opcode.
Regardless of access type or byte ordering (endianess), the address given specifies the loworder byte in the addressed field. For a big-endian configuration, the low-order byte is the most-significant byte; for a little-endian configuration, the low-order byte is the leastsignificant byte. The access type, together with the four low-order bits of the address, defines the bytes accessed within the addressed doubleword (shown in Table 3-4 and Table 3-5). Only the combinations shown in Table 3-4 and Table 3-5 are permissible; other combinations cause address error exceptions.
3-9
Chapter 3 Instruction Set Overview and Summary
Table 3-4. Defining Access Types (Big-Endian) Access Type Mnemonic Low-Order Address Bits 3 2 1 0
0 0 1 Septibyte 0 0 1 1 Sextibyte 0 0 1 1 Quintibyte 0 0 1 1 Word 0 0 1 1 Triplebyte 0 0 0 0 1 1 1 1 Halfword 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0
1
Bytes Accessed Big endian (127---------------95----------------63-----------------31-----------------0) Byte
0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 8 0 1 2 2 3 3 4 4 5 5 6 7 8 0 1 2 3 3 4 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 1 2 2 3 4 5 5 6 6 7 8 9 9 10 10 11 12 13 14 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 9 10 11 12 13 14 15 9 10 11 12 11 12 13 14 15 9 10 11 12 13 10 11 12 13 14 15 9 9 10 11 12 13 14 10 11 12 13 14 15 7 7 8 9 10 11 12 13 14 15 8 9 10 11 12 13 14 15
Quadword Doubleword
0 1 0 1 0 0 0 0 0 0 0 0
3-10
Chapter 3 Instruction Set Overview and Summary
Access Type Mnemonic
Low-Order Address Bits 3 2 1 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Bytes Accessed Big endian (127---------------95----------------63-----------------31-----------------0) Byte
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Byte
3-11
Chapter 3 Instruction Set Overview and Summary
Table 3-5. Defining Access Types (Little-Endian) Access Type Mnemonic Low-Order Address Bits 3 2 1 0
0 0 1 Septibyte 0 0 1 1 Sextibyte 0 0 1 1 Quintibyte 0 0 1 1 Word 0 0 1 1 Triplebyte 0 0 0 0 1 1 1 1 Halfword 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0
1
Bytes Accessed Little endian (127---------------95----------------63-----------------31-----------------0) Byte
15 14 13 12 11 10 15 14 13 12 11 10 9 9 8 8 6 7 14 13 12 11 10 15 14 13 12 11 10 9 9 5 7 13 12 11 10 15 14 13 12 11 10 4 7 12 11 10 15 14 13 12 11 3 7 11 10 15 14 13 12 2 3 6 7 10 11 10 14 13 12 15 14 13 1 3 5 7 9 11 10 13 12 15 14 8 6 4 2 0 9 9 8 6 5 5 4 2 1 1 0 9 8 6 5 4 2 1 0 9 8 6 5 4 3 3 2 1 0 9 8 6 5 4 4 3 3 2 2 1 0 8 6 5 5 4 4 3 3 2 2 1 1 0 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0
Quadword Doubleword
0 1 0 1 0 0 0 0 0 0 0 0
3-12
Chapter 3 Instruction Set Overview and Summary
Access Type Mnemonic
Low-Order Address Bits 3 2 1 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Bytes Accessed Little endian (127---------------95----------------63-----------------31-----------------0) Byte
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Byte
3.3.1.6
Scheduling a Load Delay Slot
A load instruction that does not allow its result to be used by the instruction immediately following is called a delayed load instruction. The instruction slot immediately following this delayed load instruction is referred to as the load delay slot. In the C790 processor, the instruction immediately following a load instruction can use the contents of the loaded register. In such cases, however, hardware interlocks insert additional clock cycles. Consequently, scheduling load delay slots can be desirable, both for performance and R-Series processor compatibility. However, the scheduling of load delay slots is not absolutely required.
3-13
Chapter 3 Instruction Set Overview and Summary
3.3.2
Computational Instructions
The instructions in this group perform two's complement arithmetic, logical operations, or shifts on integers represented in two's complement notation. Computational instructions can be either in register (R-type) format, in which both operands are registers, or in immediate (I-type) format, in which one operand is a 16-bit immediate. Computational instructions perform the following operations on register values: * * * * * * * * * Arithmetic Logical Shift Multiply Divide ALU immediate instructions Three-Operand Register-Type instructions Shift instructions Multiply and Divide instructions
These operations fit in the following four categories of computational instructions:
For detailed information of individual instructions, refer to Appendix A. *Note: The C790 does not support 64-bit Multiply and Divide instructions, DMULT, DMULTU, DDIV, and DDIVU. 3.3.2.1 ALU Immediate Instructions
Table 3-6. ALU Immediate Instructions Mnemonic
ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI DADDI DADDIU
Description
Add Immediate Add Immediate Unsigned Set on Less Than Immediate Set on Less Than Immediate Unsigned AND Immediate OR Immediate Exclusive OR Immediate Load Upper Immediate Doubleword Add Immediate Doubleword Add Immediate Unsigned
Defined in
MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS III MIPS III
3-14
Chapter 3 Instruction Set Overview and Summary 3.3.2.2 Three Operand Register-Type Instructions
Table 3-7. Three Operand Register-Type Instructions Mnemonic
ADD ADDU SUB SUBU DADD DADDU DSUB DSUBU SLT SLTU AND OR XOR NOR
Description
Add Add Unsigned Subtract Subtract Unsigned Doubleword Add Doubleword Add Unsigned Doubleword Subtract Doubleword Subtract Unsigned Set Less Than Set Less Than Unsigned AND OR Exclusive OR NOR
Defined in
MIPS I MIPS I MIPS I MIPS I MIPS III MIPS III MIPS III MIPS III MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I
3.3.2.3
Shift Instructions
Table 3-8. Shift Instructions Mnemonic
SLL SRL SRA SLLV SRLV SRAV DSLL DSRL DSRA DSLL32 DSRL32 DSRA32 DSLLV DSRLV DSRAV
Description
Shift Left Logical Shift Right Logical Shift Right Arithmetic Shift Left Logical Variable Shift Right Logical Variable Shift Right Arithmetic Variable Doubleword Shift Left Logical Doubleword Shift Right Logical Doubleword Shift Right Arithmetic Doubleword Shift Left Logical + 32 Doubleword Shift Right Logical + 32 Doubleword Shift Right Arithmetic + 32 Doubleword Shift Left Logical Variable Doubleword Shift Right Logical Variable Doubleword Shift Right Arithmetic Variable
Defined in
MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III
3.3.2.4
Multiply and Divide Instructions
These are the standard MIPS instructions for multiply, divide, and move to / from HI / LO registers executed on the I0 pipeline's MAC unit. See also C790-specific Multiply and Divide instructions discussion.
Table 3-9. Multiply and Divide Instructions Mnemonic
MULT MULTU DIV DIVU MFHI MTHI MFLO MTLO
Description
Multiply Multiply Unsigned Divide Divide Unsigned Move From HI Move To HI Move From LO Move To LO
Defined in
MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I
3.3.2.5
64-Bit Operations
The result of operations that use incorrect sign-extended 32-bit values for 64-bit operations is unpredictable. 3-15
Chapter 3 Instruction Set Overview and Summary
3.3.3
Jump and Branch Instructions
The architecture defines PC-relative conditional branches, a PC-region unconditional jump, an absolute (register) unconditional jump, and a similar set of procedure calls that record a return link address in a general register. For convenience, these are all referred to here as branches. All branches have an architectural delay of one instruction. When a branch is taken, the instruction immediately following the branch instruction, in the branch delay slot, is executed before the branch to the target instruction takes place. Conditional branches come in two versions that treat the instruction in the delay slot differently when the branch is not taken and execution falls through. The `branch' instructions execute the instruction in the delay slot, but the `branch likely' instructions do not. (They are said to `nullify' it.) By convention, if an exception or interrupt prevents the completion of an instruction occupying a branch delay slot, the instruction stream is continued by re-executing the branch instruction. To permit this, branches must be restartable; procedure calls may not use the register in which the return link is stored (usually register 31) to determine the branch target address. For detailed information of individual instructions, refer to Appendix A. Branch on Coprocessor instructions are covered under coprocessor's discussions. 3.3.3.1 Jump Instructions
Subroutine calls in high-level languages are usually implemented with Jump or Jump and Link instructions, both of which are J-type instructions. In J-type format, the 26-bit target address shifts 2 bits and combines with the high-order 4-bits of the current program counter to form an absolute address. Returns, dispatches, and large cross-page jumps are usually implemented with the Jump Register or Jump and Link Register instructions. Both are R-type instructions that take the 32-bit byte address contained in one of the general purpose registers.
Table 3-10. Jump Instructions Jumping Within a 256 MByte Region Mnemonic
J JAL
Description
Jump Jump and Link
Defined in
MIPS I MIPS I
Table 3-11. Jump Instructions to Absolute Address Mnemonic
JR JALR
Description
Jump Register Jump and Link Register
Defined in
MIPS I MIPS I
3-16
Chapter 3 Instruction Set Overview and Summary
3.3.3.2
Branch Instructions
All branch instruction target addresses are computed by adding the address of the instruction in the branch delay slot to the 16-bit offset (shifts left 2 bits and is signextended to 32-bits). All branches occur with a delay of one instruction. In case of a Branch Likely instruction, if a condition is not taken, the instruction in the delay slot is nullified.
Table 3-12. PC-Relative Conditional Branch Instructions Comparing 2 Registers Mnemonic
BEQ BNE BLEZ BGTZ BEQL BNEL BLEZL BGTZL
Description
Branch on Equal Branch on Not Equal Branch on Less Than or Equal to Zero Branch on Greater Than Zero Branch on Equal Likely Branch on Not Equal Likely Branch on Less Than or Equal to Zero Likely Branch on Greater Than Zero Likely
Defined in
MIPS I MIPS I MIPS I MIPS I MIPS II MIPS II MIPS II MIPS II
Table 3-13. PC-Relative Conditional Branch Instructions Comparing Against Zero Mnemonic
BLTZ BGEZ BLTZAL BGEZAL BLTZL BGEZL BLTZALL BGEZALL
Description
Branch on Less Than Zero Branch on Greater Than or Equal to Zero Branch on Less Than Zero and Link Branch on Greater Than or Equal to Zero and Link Branch on Less Than Zero Likely Branch on Greater Than or Equal to Zero Likely Branch on Less Than Zero and Link Likely Branch on Greater Than or Equal to Zero and Link Likely
Defined in
MIPS I MIPS I MIPS I MIPS I MIPS II MIPS II MIPS II MIPS II
3-17
Chapter 3 Instruction Set Overview and Summary
3.3.4
3.3.4.1
Miscellaneous Instructions
Exception Instructions
Exception instructions have as their sole purpose causing an exception that will transfer control to a software exception handler in the kernel. System call and breakpoint instructions cause exceptions unconditionally. The trap instructions cause exceptions conditionally based upon the result of a comparison. For detail of these instructions, refer to the individual instruction as described in Appendix A.
Table 3-14. Exception Instructions Mnemonic
BREAK SYSCALL TGE TGEU TLT TLTU TEQ TNE TGEI TGEIU TLTI TLTIU TEQI TNEI
Description
Breakpoint System Call Trap if Greater or Equal Trap if Greater or Equal Unsigned Trap if Less Than Trap if Less Than Unsigned Trap if Equal Trap if Not Equal Trap if Greater or Equal Immediate Trap if Greater or Equal Immediate Unsigned Trap if Less Than Immediate Trap if Less Than Immediate Unsigned Trap if Equal Immediate Trap if Not Equal Immediate
Defined in
MIPS I MIPS I MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II
3.3.4.2
Serialization Instructions
The order in which memory accesses from load and store instructions appear outside the C790 is not specified by the architecture. The SYNC (or SYNC.L) instruction creates a point in the executing instruction stream at which the relative order of some loads and store is known. Loads and stores executed before the SYNC (or SYNC.L) are retired before loads and stores after the SYNC (or SYNC.L) can start. In order to guarantee the completion of certain instructions a SYNC.P instruction can be used. Instructions executed before a SYNC.P instruction are completed before instructions after the SYNC.P can start. For detail of this instruction refer to SYNC instruction as described in Appendix A.
Table 3-15. Serialization Instructions Mnemonic
SYNC
2
Description
Synchronization
Defined in
MIPS II
2
This includes the SYNC, SYNC.L and SYNC.P instructions.
3-18
Chapter 3 Instruction Set Overview and Summary
3.3.4.3
MIPS IV Instructions
The C790 supports a part of the MIPS IV instructions: Conditional Move instructions and Prefetch instruction. Conditional move operations allow `IF' statements to be represented without branches. `THEN' and `ELSE' clauses are computed unconditionally and the results are placed in a temporary register. Conditional move operations then transfer the temporary results to their true register. The Prefetch instruction fetches data expected to be used in the near future and places it in the data cache. For detail of these instructions, refer to the individual instruction as described in Appendix A.
Table 3-16. MIPS IV Instructions Mnemonic
MOVN MOVZ PREF
Description
Move Conditional on Not Zero Move Conditional on Zero Prefetch
Defined in
MIPS IV MIPS IV MIPS IV
3-19
Chapter 3 Instruction Set Overview and Summary
3.3.5
System Control Coprocessor (COP0) Instructions
COP0 instructions perform operations specifically on the System Control Coprocessor registers to manipulate the memory management, exception handling, performance monitor, and debug facilities of the processor. COP0 instructions are enabled if the processor is in Kernel mode, or if bit 28 (CU) is set in the Status register. Otherwise executing one of these instructions generates a Coprocessor Unusable Exception. For details of COP0 instructions refer to Appendix C.
Table 3-17. System Control Coprocessor Instructions Mnemonic
BC0F BC0T BC0FL BC0TL CACHE DI EI ERET TLBR TLBWI TLBWR TLBP MTC0 MFC0 MTPC MFPC MTPS MFPS MTBPC MFBPC MTDAB MFDAB MTDABM MFDABM MTIAB MFIAB MTIABM MFIABM MTDVB MFDVB MTDVBM MFDVBM
Description
Branch on Coprocessor 0 False Branch on Coprocessor 0 True Branch on Coprocessor 0 False Likely Branch on Coprocessor 0 True Likely Cache Operation Disable Interrupt Enable Interrupt Exception Return Read Indexed TLB Entry Write Index TLB Entry Write Random TLB Entry Probe TLB for Matching Entry Move To System Control Coprocessor Move From System Control Coprocessor Move To Performance Counter Move From Performance Counter Move To Performance Event Specifier Move From Performance Event Specifier Move To Breakpoint Control Register Move From Breakpoint Control Register Move To Data Address Breakpoint Register Move From Data Address Breakpoint Register Move To Data Address Breakpoint Mask Register Move From Data Address Breakpoint Mask Register Move To Instruction Address Breakpoint Register Move From Instruction Address Breakpoint Register Move To Instruction Address Breakpoint Mask Register Move From Instruction Address Breakpoint Mask Register Move To Data Value Breakpoint Register Move From Data Value Breakpoint Register Move To Data Value Breakpoint Mask Register Move From Data Value Breakpoint Mask Register
Defined in
MIPS I MIPS I MIPS II MIPS II R4000 C790 C790 R4000 R4000 R4000 R4000 R4000 R4000 R4000 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790 C790
3-20
Chapter 3 Instruction Set Overview and Summary
3.3.6
Coprocessor 1 (COP1)
Coprocessor instructions perform operations in their respective coprocessors. Coprocessor loads and stores are I-type, and coprocessor computational instructions have coprocessordependent formats. Coprocessor load and store instructions are summarized in 3.3.1.3. 3.3.6.1 Coprocessor 1 (COP1) Instructions
COP1 instructions are enabled if bit 29 (CU) is set in the Status register. Otherwise executing one of these instructions generates a Coprocessor Unusable Exception. For details of COP1 instructions refer to Appendix D.
Table 3-18. Coprocessor 1 Instructions Mnemonic
BC1F BC1T LWC1 LDC1 SWC1 SDC1 MFC1 MTC1 DMFC1 DMTC1 CFC1 CTC1 CVT.D.fmt CVT.L.fmt CVT.S.fmt CVT.W.fmt ADD.fmt SUB.fmt MUL.fmt DIV.fmt ABS.fmt MOV.fmt NEG.fmt SQRT.fmt C.cond.fmt CEIL.L.fmt CEIL.W.fmt FLOOR.L.fmt FLOOR.W.fmt ROUND.L.fmt ROUND.W.fmt TRUNC.L.fmt TRUNC.W.fmt
Description
Branch on Floating Point False Branch on Floating Point True Load Word to Floating Point Load Doubleword to Floating Point Store Word from Floating Point Store Doubleword from Floating Point Move Word from Floating Point Move Word to Floating Point Move Doubleword from Floating Point Move Doubleword to Floating Point Move Control Word from Floating Point Move Control Word to Floating Point Floating Point Convert to Double Floating Point Floating Point Convert to Long Fixed Point Floating Point Convert to Single Floating Point Floating Point Convert to Word Fixed Point Floating Point Add Floating Point Subtract Floating Point Multiply Floating Point Divide Floating Point Absolute Floating Point Move Floating Point Negate Floating Point Square Root Floating Point Compare Floating Point Ceiling Convert to Long Fixed Point Floating Point Ceiling Convert to Word Fixed Point Floating Point Floor Convert to Long Fixed Point Floating Point Floor Convert to Word Fixed Point Floating Point Round to Long Fixed Point Floating Point Round to Word Fixed Point Floating Point Truncate to Long Fixed Point Floating Point Truncate to Word Fixed Point
Defined in
MIPS I MIPS I MIPS I MIPS II MIPS I MIPS II MIPS I MIPS I MIPS III MIPS III MIPS I MIPS I MIPS I, III MIPS III MIPS I, III MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS II MIPS I MIPS III MIPS II MIPS III MIPS II MIPS III MIPS II MIPS III MIPS II
3-21
Chapter 3 Instruction Set Overview and Summary
3.3.7
C790-Specific Instructions
The C790 extends its instruction set from the original MIPS architecture. The following instructions are supported: * * * * Three-operand Multiply and Multiply/Add instructions Multiply instructions for Pipeline 1 Multimedia instructions Enable interrupt and Disable interrupt instructions
For more information, refer to Appendices B and C. 3.3.7.1 Integer Multiply / Divide Instructions
The standard MIPS instructions for multiply, divide and move to / from HI / LO registers execute on the I0 pipeline's MAC unit. A complete set of new instructions has also been defined to execute on the I1 pipeline's MAC unit. All of these instructions are shown in the following table.
Table 3-19. C790-Specific Multiply and Divide Instructions OpCode Description OpCode
DIV1 DIVU1 MADD1 MADDU1 MFHI1 MFLO1 MTHI1 MTLO1
Description
Divide 1 Divide Unsigned 1 Multiply/Add 1 Multiply/Add Unsigned 1 Move From HI 1 Move From LO 1 Move To HI 1 Move To LO 1
(Three Operand Multiply and Multiply-add) MADD Multiply/Add MADDU Multiply/Add Unsigned MULT Multiply(3-operand) MULTU Multiply Unsigned(3-operand) (Multiply Instructions for Pipeline 1) MULT1 Multiply 1 MULTU1 Multiply Unsigned 1
The C790 supports three-operand multiply instructions that store the multiply result to a general purpose register in addition to the LO register. These instructions, as such, don't have to use the MFLO instruction to move data from the LO register to a general purpose register. * * MULT rd, rs, rt MULTU rd, rs, rt HI || LO = rs * rt (signed) rd = new LO contents HI || LO = rs * rt (unsigned) rd = new LO contents
The C790 also supports new multiply-add instructions, MADD and MADDU. These instructions execute multiply-accumulate operations using the HI and LO registers as accumulators. * * MADD rd, rs, rt MADDU rd, rs, rt HI || LO += rs * rt (signed) rd = new LO contents HI || LO += rs * rt (unsigned) rd = new LO contents
3-22
Chapter 3 Instruction Set Overview and Summary
3.3.7.2
Multimedia Instructions
The C790 defines a new set of instructions to support multimedia applications. These instructions are shown in Table 3-20. Most of these instructions do parallel operations on data by combining the execution units of the two pipelines (I0 and I1). They form a 128-bit path and then do parallel operations on either two 64-bit data items, four 32-bit data items, eight 16-bit data items, or sixteen 8-bit data items. In order to support the 128-bit datapath, 128-bit load/store operations are also implemented.
Table 3-20. Multimedia Instructions OpCode (Arithmetic)
PADDB PSUBB PADDH PSUBH PADDW PSUBW PADSBH PADDSB PSUBSB PADDSH PSUBSH PADDSW PSUBSW PADDUB PSUBUB PADDUH PSUBUH Parallel Add Byte Parallel Subtract Byte Parallel Add Halfword Parallel Subtract Halfword Parallel Add Word Parallel Subtract Word Parallel Add/Subtract Halfword Parallel Add with Signed Saturation Byte Parallel Subtract with Signed Saturation Byte Parallel Add with Signed Saturation Halfword Parallel Subtract with Signed Saturation Halfword Parallel Add with Signed Saturation Word Parallel Subtract with Signed Saturation Word Parallel Add with Unsigned Saturation Byte Parallel Subtract with Unsigned Saturation Byte Parallel Add with Unsigned Saturation Halfword Parallel Subtract with Unsigned Saturation Halfword Parallel Add with Unsigned Saturation Word Parallel Subtract with Unsigned Saturation Word Parallel Maximum Halfword Parallel Minimum Halfword Parallel Maximum Word Parallel Minimum Word
Description
OpCode (Absolute)
PABSH PABSW PMULTW PMULTUW PDIVW PDIVUW PMADDW PMADDUW PMSUBW PMFHI PMFLO PMTHI PMTLO PMULTH PMADDH PMSUBH PMFHL PMTHL PHMADH PHMSBH PDIVBW
Description
Parallel Absolute Halfword Parallel Absolute Word Parallel Multiply Word Parallel Multiply Unsigned Word Parallel Divide Word Parallel Divide Unsigned Word Parallel Multiply/Add Word Parallel Multiply/Add Unsigned Word Parallel Multiply/Subtract Word Parallel Move From HI Parallel Move From LO Parallel Move To HI Parallel Move To LO Parallel Multiply Halfword Parallel Multiply/Add Halfword Parallel Multiply/Subtract Halfword Parallel Move From HI/LO Parallel Move To HI/LO Parallel Horizontal Multiply/Add Halfword Parallel Horizontal Multiply/Subtract Halfword Parallel Divide Broadcast Word
(Multiply and Divide)
PADDUW PSUBUW
(Min/Max)
PMAXH PMINH PMAXW PMINW
3-23
Chapter 3 Instruction Set Overview and Summary
OpCode (SA Operation)
MFSA MTSA MTSAB MTSAH
Description
Move from SA Register Move to SA Register Move Byte Count to SA Register Move Halfword Count to SA Register Parallel Shift Left Logical Halfword Parallel Shift Right Logical Halfword Parallel Shift Right Arithmetic Halfword Parallel Shift Left Logical Word Parallel Shift Right Logical Word Parallel Shift Right Arithmetic Word Parallel Shift Left Logical Variable Word Parallel Shift Right Logical Variable Word Parallel Shift Right Arithmetic Variable Word Parallel AND Parallel OR Parallel XOR Parallel NOR Parallel Compare for Greater Than Byte Parallel Compare for Equal Byte Parallel Compare for Greater Than Halfword Parallel Compare for Equal Halfword Parallel Compare for Greater Than Word Parallel Compare for Equal Word
OpCode
LQ SQ
Description
Load Quadword Store Quadword Parallel Pack To Byte Parallel Pack To Halfword Parallel Interleave Even Halfword Parallel Pack To Word Parallel Extend Upper From Byte Parallel Extend Lower From Byte Parallel Extend Upper From Halfword Parallel Extend Lower From Halfword Parallel Extend Upper From Word Parallel Extend Lower From Word Parallel Extend from 5 bits Parallel Pack to 5 bits Parallel Copy Halfword Parallel Copy Lower Doubleword Parallel Copy Upper Doubleword Parallel Reverse Halfword Parallel Interleave Halfword Parallel Exchange Even Halfword Parallel Exchange Center Halfword Parallel Exchange Even Word Parallel Exchange Center Word Parallel Rotate 3 word Quadword Funnel Shift Right Variable Parallel Leading Zero Count Word
(Quadword Load Store)
(Pack/Extend)
PPACB PPACH PINTEH PPACW PEXTUB PEXTLB PEXTUH PEXTLH PEXTUW PEXTLW PEXT5 PPAC5
(Shift)
PSLLH PSRLH PSRAH PSLLW PSRLW PSRAW PSLLVW PSRLVW PSRAVW
(Others)
PCPYH PCPYLD PCPYUD PREVH PINTH PEXEH PEXCH PEXEW PEXCW PROT3W QFSRV PLZCW
(Logical)
PAND POR PXOR PNOR
(Compare)
PCGTB PCEQB PCGTH PCEQH PCGTW PCEQW
3-24
Chapter 3 Instruction Set Overview and Summary
3.4 User Instruction Latency and Repeat Rate
Table 3-21 shows the latencies and repeat rates for all user instructions executed in I0, I1, BR, LS and C1 execution pipelines. Kernel instructions are not included, nor are instructions not issued to these execution pipelines. See Figure 2-1 and Figure 2-4 for execution pipeline name.
Table 3-21. Latencies and Repeat Rates for User Instruction Instruction Type
Add/Sub/Logical/Set MF/MT/HI/LO Shift/LUI Branch/Jump Conditional Move MULT/MULTU MULT1/MULTU1 DIV/DIVU DIV1/DIVU1 MADD/MADDU MADD1/MADDU1 Load Store Multimedia Multiply Multimedia Multiply/Add Multimedia Divide ADD.S/SUB.S/C.cond.S ADD.D/SUB.D/C.cond.D ABS/NEG/MOV CVT MUL.S MUL.D DIV.S DIV.D SQRT.S SQRT.D MFC1/MTC1 DMFC1/DMTC1 CFC1/CTC1 LWC1/LDC1 SWC1/SDC1
Execution
I0/I1 I0/I1 I0/I1 BR I0/I1 I0 I1 I0 I1 I0 I1 LS LS I0+I1 I0+I1 I0+I1 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1+LS C1+LS C1+LS C1+LS C1+LS
Latency
1 1 1 1 1 2 2
Repeat Rate
Comment
Integer Instructions 1 1 1 1 1 4 4 37 37 4 4 1 4 4 37 Floating-Point Instructions 6 8 6 8 6 8 21 35 21 35 2 2 2 2 -
37 37 2 2 1 1 2 2 37 2 2 2 2 2 2 15 29 15 29 1 1 1 1 1
Latency relative to Lo/Hi/GPR Latency relative to Lo1/Hi1/GPR Latency relative to Lo/Hi Latency relative to Lo1/Hi1 Latency relative to Lo/Hi/GPR Latency relative to Lo1/Hi1/GPR Assuming cache hit Assuming cache hit
Assuming cache hit
3-25
Chapter 3 Instruction Set Overview and Summary
3-26
Chapter 4 CPU and COP0 Registers
4. CPU and COP0 Registers
This chapter describes the CPU registers and the System Control Coprocessor (COP0) registers. The CPU registers group consists of: * * * *
General Purpose Registers (GPRs), Multiply and Divide registers (HI and LO registers) that hold the results of HI integer multiply and divide, The SA register which is used by the funnel shift instructions, The Program Counter (PC) register.
The COP0 registers control the processor state and report its status. These registers can be read using the MFC0 instruction and written using the MTC0 instruction.
4-1
Chapter 4 CPU and COP0 Registers
4.1 CPU Registers
The central processing unit (CPU) provides the following registers: * * * *
32 128-bit General Purpose Registers (GPR) Four registers that hold the results of integer multiply and divide operations (HI0, LO0, HI1, and LO1) Shift Amount (SA) register Program Counter
The C790 has 128-bit-wide General Purpose Registers (GPRs). The upper 64 bits of the GPRs are only used by the C790-specific "Quad Load/Store", and "Multimedia (Parallel)" instructions.
HI0 and LO0 are the standard 64-bit HI and LO registers. HI1 and LO1, which are the upper 64 bits of the 128-bit HI and LO registers, are only used by the new multiply and divide instructions, such as MULT1, MULTU1, DIV1, DIVU1, MADD1, MADDU1, MFHI1, MFLO1, MTHI1, and MTLO1. All these instructions are equivalent to existing instructions which operate on HI0 and LO0 registers.
The Shift Amount (SA) register specifies the shift amount used by the funnel shift instruction. The shaded registers in Figure 4-1 are new architecturally-visible registers that are specific to the C790.
4-2
Chapter 4 CPU and COP0 Registers
General Purpose Registers
(127 63 64 0 63 63 0) 0
$0 $1 $2
$0 $1 $2
$31
HI and LO Register
$31
HI LO
HI1 LO1
HI (HI0) LO (LO0)
SA Register
31 0
SA
Program Counter
PC
Figure 4-1. CPU Registers
4-3
Chapter 4 CPU and COP0 Registers
4.1.1
General Purpose Registers
The standard 64-bit CPU general purpose registers have been extended to 128-bit registers. New instructions have been defined to use the upper 64-bits of these registers. Two of the CPU general purpose registers have special assigned functions: * *
r0 is hardwired to a value of zero, and can be used as the target register for any instruction whose result is to be discarded. r0 can also be used as a source when a zero value is needed. r31 is the link register used by the Jump and Link instructions. In general, it should not be used by other instructions.
4.1.2
HI and LO Registers
The standard 64-bit HI and LO registers have been extended to 128-bit registers. New instructions have been defined to use the upper 64-bits of these registers. HI0 and LO0 are the standard 64-bit HI and LO registers. HI1 and LO1 are the upper 64 bits of the 128-bit HI and LO registers These four registers (HI0, LO0, HI1, LO1) store: * * *
the product of integer multiply operations, or the accumulation of integer multiply-accumulate operations, or the quotient (in LO0 or LO1) and remainder (in HI0 or HI1) of integer divide operations.
4.1.3
Shift Amount (SA) Register
The SA register specifies the shift amount used by the funnel shift instruction. This is a new architecturally-visible register and it needs to be saved and restored as part of the processor state. New instructions have been defined to move values between this register and the general purpose registers.
4.1.4
Program Counter (PC)
The Program Counter (PC) holds the address of the instruction which is being executed. The PC is incremented automatically by 4 when a non-control-transfer instruction (that is: branch, jump, ERET, SYSCALL, or TRAP) is executed. Control-transfer instructions change the value of the PC to the target address specified by them. An exception also changes the contents of the PC to the specified exception vector address.
4-4
Chapter 4 CPU and COP0 Registers
4.2 System Control Coprocessor (COP0) Registers
COP0 registers are listed in Table 4-1.
Table 4-1. Coprocessor 0 Registers Register No.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Register Name
Index Random EntryLo0 EntryLo1 Context PageMask Wired (Reserved) BadVAddr Count EntryHi Compare Status Cause EPC PRId Config (Reserved) (Reserved) (Reserved) (Reserved) (Reserved) (Reserved) BadPAddr Debug Perf (Reserved) (Reserved) TagLo TagHi ErrorEPC (Reserved)
Description
Programmable register to select TLB entry for reading or writing Pseudo-random counter for TLB replacement Low half of TLB entry for even PFN (Physical page number) Low half of TLB entry for odd PFN (Physical page number) Pointer to kernel virtual PTE table in 32-bit addressing mode Mask that sets the TLB page size Number of wired TLB entries Undefined Bad virtual address Timer compare High half of TLB entry (Virtual page number and ASID) Timer compare Processor Status Register Cause of the last exception taken Exception Program Counter Processor Revision Identifier Configuration Register Undefined Undefined Undefined Undefined Undefined Undefined Bad physical address This is used for Debug function Performance Counter and Control Register Undefined Undefined Cache Tag register (low bits) Cache Tag register (high bits) Error Exception Program Counter Undefined
Purpose
MMU MMU MMU MMU Exception MMU MMU Undefined Exception Exception MMU Exception Exception Exception Exception MMU MMU Undefined Undefined Undefined Undefined Undefined Undefined Exception Debug Exception Undefined Undefined Cache Cache Exception Undefined
4-5
Chapter 4 CPU and COP0 Registers
4.2.1
31 30
Index Register (0)
6 5 0
P
1
0
25
Index
6
Figure 4-2. Index Register
The Index register is a 32-bit read/write register containing six bits to index an entry in the TLB. The high-order bit of the register records the success or failure of a TLB Probe (TLBP) instruction. The Index register also specifies the TLB entry affected by TLB Read (TLBR) or TLB Write Index (TLBWI) instructions. Table 4-2 shows the format of the Index register; Table 4-2 describes the Index register fields.
Table 4-2. Index Register Field Description Field
P Index 0
Bits
31 5:0 30:6
Description
Probe failure. Set to 1 when the previous TLB Probe (TLBP) instruction was unsuccessful. Index to the TLB entry affected by the TLB Read and TLB Write instructions. Reserved. Must be written as zeroes, and returns zeroes when read.
Type
Read/Write Read/Write Read-only
Initial Value
Undefined Undefined 0
4-6
Chapter 4 CPU and COP0 Registers
4.2.2
31
Random Register (1)
6 5 0
0
26
Random
6
Figure 4-3. Random Register
The Random register is a read-only register. The least significant six bits index an entry in the TLB. This register decrements every cycle an instruction is executed. Its value ranges between an upper and a lower bound, as follows: * *
A lower bound is set by the number of TLB entries reserved for exclusive use by the operating system (the contents of the Wired register). An upper bound is set by the total number of TLB entries (47 maximum).
The Random register specifies the entry in the TLB that is affected by the TLB Write Random (TLBWR) instruction. The register does not need to be read for this purpose; however, the register is readable to verify proper operation of the processor. To simplify testing, the Random register is set to the value of the upper bound upon system reset. This register is also set to the upper bound when the Wired register is written. Figure 4-3 shows the format of the Random Register; Table 4-3 describes the Random Register fields.
Table 4-3. Random Register Fields Field
Random 0
Bits
5:0 31:6
Description
TLB Random index. Reserved. Must be written as zeros, and returns zeroes when read.
Type
Read-only Read-only
Initial Value
Upper bound (47) 0
4-7
Chapter 4 CPU and COP0 Registers
4.2.3
EntryLo0
31
EntryLo0 Register (2), and EntryLo1 Register (3)
26 25 6 5 3 2 1 0
0
6
PFN
20
C
3
D
1
V
1
G
1
EntryLo1
31 26 25 6 5 3 2 1 0
0
6
PFN
20
C
3
D
1
V
1
G
1
Figure 4-4. EntryLo0 and EntryLo1 Registers
The EntryLo0 and EntryLo1 registers consist of two registers that have similar format: * *
EntryLo0 is used for even virtual pages. EntryLo1 is used for odd virtual pages.
The EntryLo0 and EntryLo1 registers are read/write registers. They hold the physical page frame number (PFN) of the TLB entry for even and odd pages, respectively, when performing TLB read and write operations. Figure 4-4 shows the format of the EntryLo0 and EntryLo1 Registers; Table 4-4 describes the EntryLo0 and EntryLo1 Register fields.
Table 4-4. EntryLo0 and EntryLo1 Register Fields Field
PFN C
Bits
25:6 5:3
Description
Page frame number; the upper bits of the physical address. Specifies the TLB page coherency attribute. 000(0): Reserved 001(1): Reserved 010(2): Uncached 011(3): Cacheable, write-back, write allocate 100(4): Reserved 101(5): Reserved 110(6): Reserved 111(7): Uncached Accelerated Dirty. If this bit is set, the page is marked as dirty and therefore writable. This bit is actually a write-protect bit that software can use to prevent alteration of data. Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLBL or TLBS miss will occur. Global. If this bit is set in both EntryLo0 and EntryLo1, then the processor ignores the ASID during TLB look-up. Reserved. Must be written as zeroes, and returns zeroes when read. EntryLo0[31] is reserved for Kernel use. It contains the written value. This bit has no effect on any CPU or TLB operation.
Type
Read/Write Read/Write
Initial Value
Undefined Undefined
D
2
Read/Write
Undefined
V G 0
1 0 31:26
Read/Write Read/Write Read-only
Undefined Undefined 0
Reserved codes in C field may not be written correctly into TLB entry by TLBWI or TLBWR instruction.
4-8
Chapter 4 CPU and COP0 Registers
4.2.4
31
Context Register (4)
23 22 4 3 0
PTEBase
9
BadVPN2
19
0
4
Figure 4-5. Context Register Format
The Context register is a read/write register containing the pointer to an entry in the page table entry (PTE) array. This array is an operating system data structure that stores virtual-to-physical address translations. When there is a TLB miss, the CPU loads the TLB with the missing translation from the PTE array. Normally, the operating system uses the Context register to address the current page map which resides in the kernelmapped segment, kseg3. The Context register duplicates some of the information provided in the BadVAddr register, but the information is arranged in a form that is more useful for a software TLB exception handler. Figure 4-5 shows the format of the Context register; Table 4-5 describes the Context register fields.
Table 4-5. Context Register Fields Field
PTEBase
Bits
31:23
Description
This field is a read/write field for use by the operating system. It is normally written with a value that allows the operating system to use the Context register as a pointer into the current PTE array in memory. This field is written by hardware on a miss. It contains the virtual page number (VPN) of the most recent virtual address that did not have a valid translation. Reserved. Must be written as zeros, and returns zeroes when read.
Type
Read/Write
Initial Value
Undefined
BadVPN2
22:4
Read-only
Undefined
0
3:0
Read-only
0
The 19-bit BadVPN2 field contains bits 31:13 of the virtual address that caused the TLB miss; bit 12 is excluded because a single TLB entry maps to an even-odd page pair. For a 4 KB page size, this format can directly address the pair-table of 8-byte PTEs. For other page and PTE sizes, shifting and masking this value produces the appropriate address.
4-9
Chapter 4 CPU and COP0 Registers
4.2.5
31
PageMask Register (5)
25 24 13 12 0
0
7
MASK
12
0
13
Figure 4-6. PageMask Register
The PageMask register is a read/write register used for reading or writing the TLB. It holds a comparison mask that sets the variable page size for each TLB entry, as shown in Table 4-6.
Table 4-6. PageMask Register Field Field
MASK
Bits
24:13
Description
Page comparison mask. 0000 0000 0000: Page Size = 4 Kbytes 0000 0000 0011: Page Size = 16 Kbytes 0000 0000 1111: Page Size = 64 Kbytes 0000 0011 1111: Page Size = 256 Kbytes 0000 1111 1111: Page Size = 1 Mbytes 0011 1111 1111: Page Size = 4 Mbytes 1111 1111 1111: Page Size = 16 Mbytes Reserved. Must be written as zeros, and returns zeroes when read.
Type
Read/Write
Initial Value
Undefined
0
31:25, 12:0
Read-only
0
TLB read and write operations use this register as either a source or a destination; when virtual addresses are presented for translation into physical address, the corresponding bits in the TLB identify which virtual address bits among bits 24:13 are used in the comparison. When the Mask field is not one of the values shown in Table 4-6, the operation of the TLB is undefined.
4-10
Chapter 4 CPU and COP0 Registers
4.2.6
31
Wired Register (6)
6 5 0
0
26
Wired
6
Figure 4-7. Wired Register
The Wired register is a read/write register that specifies the boundary between the wired and random entries of the TLB as shown in Figure 4-8. Wired entries are fixed, nonreplaceable entries which cannot be overwritten by a TLB write operation. Random entries can be overwritten. Figure 4-7 shows the format of the Wired register. Table 4-7 describes the register fields. The Wired register is set to 0 upon system reset. Writing this register also sets the Random register to the value of its upper bound as shown in Figure 4-8.
TLB
47 Random entries Wired Register value Wired entries 0
Figure 4-8. Wired Register Boundary
Writing a value greater than 47 into this register produces undefined results.
Table 4-7. Wired Register Field Descriptions Field
Wired 0
Bits
5:0 31:6
Description
TLB Wired boundary (the number of wired TLB entries) Reserved. Must be written as zeros, and returns zeroes when read.
Type
Read/Write Read-only
Initial Value
0 0
4-11
Chapter 4 CPU and COP0 Registers
4.2.7
31
BadVAddr Register (8)
0
BadVAddr
32
Figure 4-9. BadVAddr Register
The Bad Virtual Address register (BadVAddr) is a read-only register that displays the most recent virtual address that caused one of the following exceptions: TLB Invalid, TLB Modified, TLB Refill, or Address Error exceptions. Figure 4-9 shows the format of the BadVAddr register; Table 4-8 describes the register fields.
Table 4-8. BadVAddr Register Field Field
BadVAddr
Bits
31:0
Description
The most recent virtual address that cause a TLB Invalid, TLB modified, TLB Refill, or Address Error exception.
Type
Read-only
Initial Value
Undefined
Note: The BadVAddr register does not save any information for bus errors, since bus errors are not addressing errors.
4-12
Chapter 4 CPU and COP0 Registers
4.2.8
31
Count Register (9)
0
Count
32
Figure 4-10. Count Register
The Count register acts as a real-time timer. It is incremented every CPU clock cycle. The timer interrupt signaled through IP[7] can be disabled through the interrupt mask bit, IM[7]. This register can be read or written. Figure 4-10 shows the format of the Count register. Table 4-9 describes the register fields.
Table 4-9. Count Register Field Field
Count
Bits
31:0
Description
32-bit timer, incrementing at the CPU clock rate.
Type
Read/Write
Initial Value
Undefined
4-13
Chapter 4 CPU and COP0 Registers
4.2.9
31
EntryHi Register (10)
13 12 8 7 0
VPN2
19
0
5
ASID
8
Figure 4-11. EntryHi Register
The EntryHi register holds the high-order bits of a TLB entry for TLB read and write operations. The EntryHi register is accessed by the TLB Probe, TLB Write Random, TLB Write Indexed, and TLB Read Indexed instructions. When either a TLB Refill, TLB Invalid, or TLB Modified exception occurs, the EntryHi register is loaded with the virtual page number (VPN2) and the ASID of the virtual address that did not have a matching TLB entry. Figure 4-11 shows the format of the EntryHi register. Table 4-10 describes the register fields.
Table 4-10. EntryHi Register Fields Field
VPN2 ASID
Bits
31:13 7:0
Description
Virtual page number divided by two (maps to two pages). Address space ID field. An 8-bit field that lets multiple processes share the TLB; each process can have a distinct mapping of otherwise identical virtual page numbers. Reserved. Must be written as zeroes, and returns zeroes when read.
Type
Read/Write Read/Write
Initial Value
Undefined Undefined
0
12:8
Read-only
0
4-14
Chapter 4 CPU and COP0 Registers
4.2.10 Compare Register (11)
31 0
Compare
32
Figure 4-12. Compare Register
The Compare register acts as a timer (see also the Count register); it maintains a stable value that does not change on its own. When the value of the Count register equals the value of the Compare register, interrupt bit IP[7] in the Cause register is set. This causes an interrupt as soon as the interrupt is enabled. Writing a value to the Compare register, as a side effect, clears the timer interrupt. For diagnostic purposes, the Compare register is a read/write register. In normal use, however, the Compare register is write-only. Figure 4-12 shows the format of the Compare register. Table 4-11 describes the register fields.
Table 4-11. Compare Register Field Field
Compare
Bits
31:0
Description
The Compare register saves a stable value compared to the Count register. When the value of the Count register equals to the value of the Compare register, interrupt IP[7] occurs.
Type
Read/Write
Initial Value
Undefined
4-15
Chapter 4 CPU and COP0 Registers
4.2.11 Status Register (12)
31 28 27 26 25 24 23 22 21 19 18 17 16 15 14 13 12 11 10 9 54 32 1 0
CU (CU[3:0])
4
0F R
1 1
0
DB EE VV
1 1
0
C E E IM H D I [7] IE
1 1 1 1
0
B IM E [3:2] M
1 2
0
K S U
2
E E IE RX LL
1 1 1
2
3
2
5
Figure 4-13. Status Register
The Status register (SR) is a read/write register that contains the operating mode, interrupt enabling, and the diagnostic states of the processor. Figure 4-13 shows the format of the Status register. The following paragraphs identify the more important Status register fields and describe the fields. Some of the important fields include: *
*
The 3-bit Interrupt Mask (IM) field controls the enabling of three interrupt signals. Interrupts must be enabled before they can be asserted. Interrupts are recognized by the processor when the corresponding bits are set in both the Interrupt Mask and the Interrupt Enable fields of the Status register and the Interrupt Pending field of the Cause register. The C790 does not support software interrupts. IM[7] corresponds to the internal timer interrupt and IM[3:2] corresponds to Int[1:0] signals. The 4-bit Coprocessor Usability (CU) field (CU[3:0]) controls the usability of four possible coprocessors. Regardless of the CU[0] bit setting, COP0 is always usable in Kernel mode. For all other cases, an access to an unusable coprocessor causes an exception. C790 supports coprocessor 1 (FPU).
4-16
Chapter 4 CPU and COP0 Registers
4.2.11.1 Status Register Format Table 4-12 describes the Status register fields. All bits in the Status register are readable and writable.
Table 4-12. Status Register Fields Field Bits Description
Controls the usability of each of the four coprocessor unit numbers. COP0 is always usable when in Kernel mode, regardless of the setting of the CU[0] bit. 1 usable 0 unusable Enable additional floating point registers 0 16 registers 1 32 registers Controls the location of Performance counter and debug/SIO exception vectors. 0 normal 1 bootstrap Controls the location of TLB refill and general exception vectors. 0 normal 1 bootstrap Cache Hit (tag match and valid state) or Miss indication for last CACHE Hit Invalidate and CACHE Hit Write-back Invalidate for the Data cache. 0 miss 1 hit EI/DI instruction Enable: When this bit is set, the EI and DI instructions can operate in User, Supervisor and Kernel modes and as such set or clear the EIE bit to enable or disable all interrupts (except NMI). When this bit is cleared, EI and DI operate as NOPs in User and Supervisor modes and executes properly in Kernel mode. Enable IE: This bit enables or disables the IE (Interrupt Enable) bit. This bit is cleared by the DI instruction and set by the EI instruction. 0 disables all interrupts regardless of the value of the IE bit. 1 enables the IE bit. (All interrupts are enabled if IE=1, EXL=0, and ERL=0.) Note: IM enables individual interrupt Interrupt Mask: controls the enabling of each of the external and internal interrupts. An interrupt is taken if interrupts are enabled, and the corresponding bits are set in both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause register. 0 disabled 1 enabled Note: The enabling of this bit is valid only when EIE=1, IE=1, EXL=0 and ERL=0 Bus Error Mask: controls the updating of the BadPAddr register and signaling a bus error exception. 0 update BadPAddr and signal a bus error exception. 1 do not update BadPAddr and stop signaling a bus error exception. This bit is set to 1 when it is a 0 and a bus error is signaled. Kernel/Supervisor/User Mode bits: 002 Kernel 012 Supervisor 102 User 112 Reserved
Type
Read/ Write
Initial Value
Undefined
CU 31:28 (CU[3:0])
FR
26
Read/ Write Read/ Write
0
DEV
23
Undefined
BEV
22
Read/ Write Read/ Write
1
CH
18
Undefined
EDI
17
Read/ Write
Undefined
EIE
16
Read/ Write
Undefined
IM[7,3:2]
15, 11:10
Read/ Write
Undefined
BEM
12
Read/ Write
Undefined
KSU
4:3
Read/ Write
Undefined
4-17
Chapter 4 CPU and COP0 Registers
Field
ERL
Bits
2
Description
Error Level: set by the processor when Reset, NMI, performance counter, SIO or debug exception is taken. 0 normal 1 error Exception Level: set by the processor when any exception other than Reset, NMI, performance counter, or debug exception is taken. 0 normal 1 exception Interrupt Enable 0 disables all interrupts 1 enables all interrupts (if EIE=1, ERL=0, and EXL=0) Reserved. Must be written as zeroes, and returns zeroes when read.
Type
Read/ Write Read/ Write Read/ Write Readonly
Initial Value
1
EXL
1
Undefined
IE
0
Undefined
0
27, 25:24, 21:19, 14:13, 9:5
0
4.2.11.2 Status Register Modes and Access States Fields of the Status register set the modes and access states below. Interrupt Enable: Interrupts are enabled when all of the following conditions are true: * * * * Status.IE = 1, and Status.EIE = 1, and Status.EXL = 0, and Status.ERL = 0
If these conditions are met, setting the IM bits enable the appropriate interrupts. SIO Enable: A level 2 exception by SIO is enabled when the following condition is true: * Status.ERL = 0 If this condition is met, asserting the SIO signal causes a Debug exception to occur. Operating Modes: The following CPU Status register bit settings are required for User, Kernel, and Supervisor modes. * * *
The Processor is in User mode when KSU = 102 and EXL = 0 and ERL = 0. The processor is in Supervisor mode when KSU = 012 and EXL = 0 and ERL = 0. The processor is in Kernel mode when KSU = 002 or EXL = 1 or ERL = 1.
Kernel Address Space Accesses: Access to the kernel address space is allowed when the processor is in Kernel mode. Supervisor Address Space Accesses: Access to the supervisor address space is allowed when the processor is in Kernel mode or Supervisor mode, as described above. User Address Space Accesses: Access to the user address space is allowed in Kernel, Supervisor, and User modes.
4-18
Chapter 4 CPU and COP0 Registers
4.2.12 Cause Register (13)
31 30 29 28 27 19 18 16 15 14 13 12 11 10 9 76 21 0
B B CE DD 2
1 1 2
0
EXC2
IP
[7]
0
S IP I [3:2] O P
1 2
0
ExcCode
0
9
3
1
2
3
5
2
Figure 4-14. Cause Register
The 32-bit read-only Cause register describes the cause of the most recent exception. Figure 4-14 shows the fields of this register. Table 4-13 describes the Cause register fields. All bits in the Cause register are read-only.
Table 4-13. Cause Register Fields Field
BD
Bits
31
Description
Set by the processor when any exception other than Reset, NMI, performance counter, or debug occurs and is taken in a branch delay slot. 1 delay slot 0 normal Indicates whether the last NMI, performance counter, debug, or SIO exception taken occurred in a branch delay slot. 1 delay slot 0 normal Coprocessor unit number referenced when a Coprocessor Unusable exception is taken. Indicates the exception codes for level 2 exceptions (Performance Counter, Reset, Debug, SIO and NMI exceptions) 000 (0) : Res (Reset) 001 (1) : NMI (Non-maskable Interrupt) 010 (2) : PerfC (Performance Counter) 011 (3) : Dbg (Debug) and SIO (SIO) 1xx (4-7) : Reserved Indicates an interrupt is pending. 1 interrupt pending 0 no interrupt Indicates an SIO signal is pending 1 SIO signal is pending 0 no SIO signal is pending
Type
Read-only
Initial Value
Undefined
BD2
30
Read-only
Undefined
CE EXC2
29:28 18:16
Read-only Read-only
Undefined Undefined
IP[7,3:2]
15, 11:10 12
Read-only
Undefined, Int[1:0] SIO
SIOP
Read-only
4-19
Chapter 4 CPU and COP0 Registers
Field
ExcCode
Bits
6:2
Description
Exception code filed. 00000 (0) : Int (Interrupt) 00001 (1) : Mod (TLB modification exception) 00010 (2) : TLBL (TLB exception (load or instruction fetch)) 00011 (3) : TLBS (TLB exception (store)) 00100 (4) : AdEL (Address error exception (load or instruction fetch)) 00101 (5) : AdES (Address error exception (store)) 00110 (6) : IBE (Bus error exception (instruction fetch)) 00111 (7) : DBE (Bus error exception (data reference: load or store)) 01000 (8) : Sys (Syscall exception) 01001 (9) : Bp (Breakpoint exception) 01010 (10): RI (Reserved instruction exception) 01011 (11): CpU(Coprocessor Unusable exception) 01100 (12): Ov (Arithmetic overflow exception) 01101 (13): Tr (Trap exception) 01110 (14): Reserved 01111 (15): FPE Floating-Point exception (16-31): (Reserved) Reserved. Must be written as zeroes, and returns zeroes when read.
Type
Readonly
Initial Value
Undefined
0
27:19, 14:13, 9:7, 1:0
Readonly
0
4-20
Chapter 4 CPU and COP0 Registers
4.2.13 EPC Register (14)
31 0
EPC
32
Figure 4-15. EPC Register
The Exception Program Counter (EPC) is a read/write register that contains the address at which processing resumes after an exception has been serviced. For synchronous exceptions, the EPC register contains either: * *
the virtual address of the instruction that was the direct cause of the exception, or the virtual address of the immediately preceding branch or jump instruction (when the instruction is in a branch delay slot, and the BD bit in the Cause register is set).
On the occurrence of an exception, if the EXL bit in the Status register is set to a 1, the processor does not update the EPC register. Figure 4-15 shows the format of the EPC register. Table 4-14 describes the EPC register fields.
Table 4-14. EPC Register Field Field
EPC
Bits
31:0
Description
Contains the address at which processing can resume after an exception has been serviced.
Type
Read/Write
Initial Value
Undefined
4-21
Chapter 4 CPU and COP0 Registers
4.2.14 PRId Register (15)
31 16 15 8 7 0
0
16
Imp
8
Rev
8
Figure 4-16. PRId Register
The 32-bit read-only Processor Revision Identifier (PRId) register contains information identifying the implementation and revision level of the C790 and COP0. Figure 4-16 shows the format of the PRId register; Table 4-15 describes the PRId register fields. The low-order byte (bits 7:0) of the PRId register is interpreted as a revision number, and the high-order byte (bits 15:8) is interpreted as an implementation number. The implementation number of the C790 processor is 0x38 The content of the high-order 0x38 38. halfword (bits 31:16) of the register are reserved. The revision number is stored as a value in the form y.x, where y is major revision number in bits 7:4 and x is a minor revision number in bits 3:0. The revision number can distinguish some chip revisions, but there is no guarantee that changes to the chip will necessarily be reflected in the PRId register, or that changes to the revision number necessarily reflect real chip changes. For this reason, these values are not listed and software should not rely on the revision number in the PRId register to characterize the chip.
Table 4-15. PRId Register Fields Field
Imp Rev 0
Bits
15:8 7:0 31:16
Description
Implementation number Revision number of each mask Reserved. Must be written as zeroes, and returns zeroes when read.
Type
Read-only Read-only Read-only
Initial Value
0x38 Revision number
4-22
Chapter 4 CPU and COP0 Registers
4.2.15 Config Register (16)
31 30 28 27 19 18 17 16 15 14 13 12 11 98 65 32 0
0
EC
0
DIDB ICCE EEE
1 1 1 1
0NB BP EE
1 1 1
IC
DC
0
K0
1
3
9
3
3
3
3
Figure 4-17. Config Register Format
The Config register specifies various configuration options which can be selected. Figure 417 shows the format of the Config register; Table 4-16 describes the Config register fields. Some configuration options, as defined by Config bits 30:28, 15 and 11:6, are set by the hardware during reset and are included in the Config register as read-only status bits for the software to access. Other configuration options like 18:16 and 13:12 are set by hardware during reset and can be modified by software. Other configuration options like bits 2:0 are read/write and controlled by software; on reset these fields are undefined.
Table 4-16. Config Register Fields Field
EC
Bits
30:28
Description
Bus clock ratio. 000: processor clock frequency divided by 2 001 ~ 111: (Reserved) Double issue enable 0 Single issue 1 Double issue Setting this bit to 1 enables the instruction cache. 0 Instruction cache disable 1 Instruction cache enable The CACHE instruction for the instruction cache is enabled regardless of the value of this bit. Setting this bit to 1 enables the data cache. 0 Data cache disable 1 Data cache enable If the cache is disabled, the PREF instruction becomes a NOP. Big Edian 0 Little Edian 1 Big Edian Setting this bit to 1 enables non-blocking load. 0 Disable Non-blocking loads and hit under miss 1 Enable Non-blocking loads and hit under miss Setting this bit to 1 enables branch prediction. 0 Disable Branch Prediction 1 Enable Branch Prediction Instruction cache Size (Instruction cache size = 212+IC bytes). 011 32 KB Data cache Size (Data cache size = 212+DC bytes). 011 32 KB
Type
Read-only
Initial Value
0
DIE ICE
18 17
Read/Write Read/Write
0 0
DCE
16
Read/Write
0
BE NBE
15 13
Read-only Read/Write
Pin 0
BPE
12
Read/Write
0
IC DC
11:9 8:6
Read-only Read-only
011 011
4-23
Chapter 4 CPU and COP0 Registers
Field
K0
Bits
2:0
Description
kseg0 coherency algorithm. 000: Reserved 001: Reserved 010: Uncached 011: Cacheable, write-back, write allocate 100: Reserved 101: Reserved 110: Reserved 111: Uncached Accelerated Reserved, Must be written as zeroes, and returns zeroes when read.
Type
Read/Write
Initial Value
Undefined
0
31, 27:19, 14, 5:3
Read-only
0
With single issue enabled (DIE = 0), the C790 always fetches two instructions but only issues a single instruction.
4-24
Chapter 4 CPU and COP0 Registers
4.2.16 BadPAddr Register (23)
31 4 3 0
BdPAddr
28
0
4
Figure 4-18. BadPAddr Register Format
The Bad Physical Address register (BadPAddr) is a read-only register that contains the most recent physical address that caused a bus error. It is updated with a new value whenever Status.BEM is clear (0). Once this bit is set (on the occurrence of a bus error) the register holds the value. Figure 4-18 shows BadPAddr register format; Table 4-17 describes the register fields.
Table 4-17. BadPAddr Register Fields Field
BdPAddr 0
Bits
31:4 3:0
Description
Physical Address value Reserved. Returns zeros when read.
Type
Read-Only Read-Only
Initial Value
undefined 0
4-25
Chapter 4 CPU and COP0 Registers
4.2.17 Debug Registers (24)
There are seven separately addressable debug registers, which are all assigned to CP0, register 24. Each of the seven registers is accessed by specifying subaccess code which is bit2 to bit0 of an instruction code.
Breakpoint Control Register (BPC) (subaccess code 0)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 3 2 1 0
I A E
DD RW EE
D V E
0
I U E
I S E
I K E
I E
0
D U E
D S E
D K E
D X E
I T E
D T E
B E D
0
D W B
D R B
I A B
See Table 13-3 for a detailed description of individual BPC register fields.
4-26
Chapter 4 CPU and COP0 Registers
Instruction Address Breakpoint (IAB) (subaccess code 2)
31 21 0
IAB
30
0
2
Instruction Address Breakpoint Mask Register (IABM) (subaccess code 3)
31 21 0
IABM
30
0
2
Data Address Breakpoint Register (DAB) (subaccess code 4)
31 0
DAB
32
Data Address Breakpoint Mask Register (DABM) (subaccess code 5)
31 0
DABM
32
Data value Breakpoint Register (DVB) (subaccess code 6)
31 0
DVB
32
Data value Breakpoint Mask Register (DVBM) (subaccess code 7)
31 0
DVBM
32
4-27
Chapter 4 CPU and COP0 Registers
4.2.18 Performance Counter Registers (25)
There are three separately addressable performance counter registers, which are all assigned to COP0, register 25. Each of the three registers is accessed by specifying subaccess code which is bit1 to bit0 of an instruction code. All performance counter registers are read/write registers.
Performance Counter Control Register (PCCR)
31 30 20 19 15 14 13 12 11 10 9 54 3 2 1 0
C T E
1
0 EVENT1
US 11
K 1
E X L 1
1
0 EVENT0
USKE 000X L 0
1 1 1 1
0
11
5
1
1
1
1
5
1
Performance Counter Register 0 (PCR0)
31 30 0
O V F L
1
VALUE
31
Performance Counter Register 1 (PCR1)
31 30 0
O V F L
1
VALUE
31
Figure 4-19. Performance Counter Registers
4-28
Chapter 4 CPU and COP0 Registers Table 4-18 lists the field definitions for the Performance Counter Control register.
Table 4-18. Performance Counter Control Register Fields Field
CTE
Bits
31
Description
Enables event counting (CTR1, CTR0) and exception generation: 0 Disable 1 Enable Set the event to be monitored by PCR1 00000 (0) Low-order branch issued 00001 (1) Processor cycle 00010 (2) Dual instruction issue 00011 (3) Branch miss predicted 00100 (4) TLB miss 00101 (5) DTLB miss 00110 (6) Data Cache miss 00111 (7) WBB single request unavailable 01000 (8) WBB burst request unavailable 01001 (9) WBB burst request almost full 01010 (10) WBB burst request full 01011 (11) CPU data bus busy 01100 (12) Instruction completed 01101 (13) Non-BDS instruction completed 01110 (14) COP1 instruction completed 01111 (15) Store completed 10000 (16) No event (17-31) Reserved Set the event to be monitored by PCR0 00000 (0) Reserved 00001 (1) Processor cycle 00010 (2) Single instruction issue 00011 (3) Branch issue 00100 (4) BTAC miss 00101 (5) ITLB miss 00110 (6) Instruction Cache miss 00111 (7) DTLB accessed 01000 (8) Non-blocking load 01001 (9) WBB single request 01010 (10) WBB burst request 01011 (11) CPU address bus busy 01100 (12) Instruction completed 01101 (13) Non-BDS instruction completed 01110 (14) Reserved 01111 (15) Load completed 10000 (16) No event (17-31) Reserved. Enables event counting (PCR1/PCR0) in the User mode. 0 Disable 1 Enable Enables event counting (PCR1/PCR0) in the Supervisor mode. 0 Disable 1 Enable Enables event counting (PCR1/PCR0) in the Kernel mode. 0 Disable 1 Enable Enables event counting (PCR1/PCR0) when EXL bit is set in the Status register. 0 Disable 1 Enable Reserved. Must be written as zero, and returns zero when read.
Type
Read/Write
Initial Value
0
EVENT1
19:15
Read/Write
Undefined
EVENT0
9:5
Read/Write
Undefined
U1, U0 S1, S0
14, 4 13, 3
Read/Write Read/Write
Undefined Undefined
K1, K0 EXL1, EXL0
12, 2 11, 1
Read/Write Read/Write
Undefined Undefined
0
30:20, 10, 0
Read-only
0
4-29
Chapter 4 CPU and COP0 Registers Table 4-19 lists the field definitions for the Performance Counter register 0 (PCR0).
Table 4-19. Performance Counter Register 0 Fields Field
OVFL VALUE
Bits
31 30:0 Overflow flag The actual counter
Description
Type
Read/Write Read/Write
Initial Value
Undefined Undefined
Table 4-20 lists the field definitions for the Performance Counter register1 (PCR1).
Table 4-20. Performance Counter Register 1 Fields Field
OVFL VALUE
Bits
31 30:0 Overflow flag The actual counter
Description
Type
Read/Write Read/Write
Initial Value
Undefined Undefined
4-30
Chapter 4 CPU and COP0 Registers
4.2.19 TagLo (28) and TagHi (29) Registers
TagLo
31 12 11 7 6 5 4 3 2 0
PTagLo
20
Special use
5
D
1
V
1
R
1
L
1
Su
3
TagHi
31 0
Special use
32
Figure 4-20. TagLo and TagHi Registers
The TagLo and TagHi registers are 32-bit read/write registers used by the CACHE instruction. For writing to the data cache tags, the TagLo register contains the fields as shown above and the TagHi register is not used. For writing to the data cache data portion the TagLo register contains the data value. For writing to the instruction cache tags the TagLo register contains the fields as defined above except that bits three and six are also reserved bits. For writing to the instruction cache data portion, the TagLo register contains the data (instruction) and the TagHi register contains the steering bits and bits for the BHT as defined in Chapter 7. When reading from the caches, the values in the TagLo and TagHi register are the same as described above for writing. These registers are also used for manipulating the BTAC. See the description of the CACHE instruction in Appendix C for details. Figure 4-20 shows the format of these registers for some of the cache operations.
4-31
Chapter 4 CPU and COP0 Registers Table 4-21 lists the field definitions of the TagLo register.
Table 4-21. TagLo Register Fields Field
PTagLo [31:12] D
Bits
31:12 6
Description
PTagLo[31:12] specifies 20-bit physical address tag cache. Dirty: 0 Clean 1 Dirty Valid: 0 Invalid 1 Valid LRF Replacement: This bit participates in the calculation determining which cache way will be used for the next replacement. See Section 7.3.1 for details. Lock: This bit is only used for the data cache. For instruction cache operations this bit is treated as a reserved bit. 0 For this line, this side is not locked. 1 For this line, this side is locked. Used by the CACHE instruction to manipulate the branch target address cache. Refer to Chapter 7 for details.
Type
Read/Write Read/Write
Initial Value
Undefined Undefined
V
5
Read/Write
Undefined
R
4
Read/Write
Undefined
L
3
Read/Write
Undefined
Special use, Su
11:7, 2:0
Read/Write
Undefined
Table 4-22. TagHi Register Fields Field
Special use
Bits
31:0
Description
Type
Initial Value
Undefined
The TagHi register is used by the CACHE instruction to manipulate Read/Write some of the bits of the instruction cache. Refer to Chapter 7 for details.
4-32
Chapter 4 CPU and COP0 Registers
4.2.20 ErrorEPC (30)
31 0
ErrorEPC
32
Figure 4-21. ErrorEPC Register
The ErrorEPC register is similar to the EPC register, except that ErrorEPC is used on nonmaskable interrupt (NMI), debug, SIO, and performance counter exceptions. The read/write ErrorEPC register contains the virtual address at which instruction processing can resume after servicing an error. This address can be: * *
the virtual address of the instruction that caused the exception the virtual address of the immediately preceding branch or jump instruction (when the instruction is in a branch delay slot, and the BD2 bit in the Cause register is set).
Table 4-23 lists the field definition of the ErrorEPC register.
Table 4-23. ErrorEPC Register Field Field
ErrorEPC
Bits
31:0
Description
Contains the virtual address at which instruction processing can resume after servicing an error.
Type
Read/Write
Initial Value
Undefined
4-33
Chapter 4 CPU and COP0 Registers
4-34
Chapter 5 Exception Processing and Reset
5. Exception Processing and Reset
This chapter describes the exception processing, including level 1 and level 2 exceptions.
5-1
Chapter 5 Exception Processing and Reset
5.1 The Exception Handling Process
Exceptions can be recognized while the program is any of its three operating modes: User, Supervisor, or Kernel. Exceptions are categorized into 2 groups which are level 1 exceptions and level 2 exceptions as shown in Table 5-1.
Table 5-1. Exception Levels Level 1 Exceptions Interrupt TLB Modified TLB Refill TLB Invalid Address Error Syscall Break Trap Reserved Instruction Coprocessor Unusable Integer Overflow Bus Error Floating Point Exception Level 2 Exceptions Reset NMI Performance Counter Debug SIO
Compatibility Note: Level 2 exceptions are a generalization of "error level" exception processing defined in earlier MIPS implementation.
5.1.1
Level 1 Exceptions
Exception Processing When the processor takes a level 1 exception, the processor switches to Kernel mode. Rather than set the Status.KSU bits to effect the switch, the Status.EXL bit is set to 1. Whenever Status.EXL is 1, the operating mode is Kernel mode, regardless of the setting of Status.KSU. Then the processor saves the virtual address of the instruction canceled by the exception. This address is saved in the EPC register. If the canceled instruction is in the delay slot of a branch instruction, the Cause.BD bit is set to 1 and EPC is set to the address of the branch instruction (rather than the delay slot). For non-delay-slot instructions, Cause.BD is set to 0. If Status.EXL bit was 1 before the exception is taken, EPC and Cause.BD aren't set. The exception service routine examines Cause.BD to determine the true address of the instruction that raised the exception. In addition to setting EPC, Cause.BD, and Status.EXL, the 5 bit field Cause.ExcCode is also set. This field specifies the cause of the exception; The Cause.CE fields may also get set when an Coprocessor unusable exception is raised. After setting those bits, the processor jumps to the exception vector address.
5-2
Chapter 5 Exception Processing and Reset The basic exception handling operation performed can be described using the Figure 5-1 Level 1 Exception Processing Flowchart. (see next page) Disabled exceptions in level 1 exception handler Once a level 1 exception service routine is entered, interrupts and bus error are unconditionally disabled.
C790 Programming Note: The only level 1 exception that is unconditionally disabled within level 1 exceptions handler is external interrupts and bus errors. All other level 1 exceptions still occur and are recognized (if enabled). a software system that makes use of such exceptions must use extreme care. In particular, it must make sure that it has saved EPC and Cause.BD somewhere (e.g. in a software managed stack) before the exception occurs.
5-3
Chapter 5 Exception Processing and Reset
Set Cause.ExcCode Cause.CE coprocessor number when CpU exception Set BadVAddr when AdES, AdEL or any TLB exception Set Context and EntryHi when any TLB exception Set BadPAddr when Bus Error
=1 Status.EXL
=0 YES
Instr.in Br.Dly.Slot ?
No EPC PC-4 Cause.BD 1 EPC PC Cause.BD 0
Status.EXL 1
= TLB Refill Exception ?
= Interrupt
= Others Offset 0x0 Offset 0x180 Offset 0x200 Offset 0x180
= 0 (normal) Status.BEV
= 1 (bootstrap)
PC 0x8000 0000+Offset
PC 0xBFC0 0200+Offset
Figure 5-1. Level 1 Exception processing flowchart
5-4
Chapter 5 Exception Processing and Reset
5.1.2
Level 2 Exceptions
Exception Processing When the processor takes a level 2 exception, the processor switches to kernel mode, by setting Status.ERL to 1. The address of the instruction where the Level 2 exception was recognized is stored in the ErrorEPC register. If the canceled instruction is in the delay slot of a branch instruction, the Cause.BD2 bit is set to 1 and ErrorEPC is set to the address of the branch instruction (rather than the delay slot). For non-delay-slot instructions, Cause.BD2 is set to 0. In addition, the cause of the exception is stored in Cause.EXC2. After setting those bits, the processor jumps to the exception vector address. The basic Level 2 exception handling operation performed can be described using the Figure 5-2 Level 2 Exception processing Flowchart. (see next page) Disabled Exceptions in level 2 exceptions When executing a Level 2 exception service routine, following exceptions are disabled. * * NMI, Interrupt, and Bus error Debug, SIO and Performance counter
C790 Implementation Note: Any external exception that is not level-sensitive (e.g. NMI) must be held until it is recognized; i.e. at least until the Level 2 handler is exited. C790 Programming Note: It is the programmer's responsibility to ensure that all other internal exceptions (e.g. OVERFLOW) never occur within a Level 2 handler. If they do occur, the corresponding Level 1 exception handler will be entered. Since both Status.EXL and Status.ERL will be set when servicing this (nested) exception, the ERET used to exit the service routine will operate incorrectly. C790 Programming Note: When Status.ERL = 1, the user address, Kuseg, region becomes a 231-byte unmapped, uncached address space (that is, mapped directly to physical address 0x0000 0000-0x7FFF FFFF).
5-5
Chapter 5 Exception Processing and Reset
Set Cause.EXC2 1
YES
Instr.in Br.Dly.Slot ?
No ErrorEPC PC-4 Cause.BD2 1 ErrorEPC PC Cause.BD 2 0
Status.ERL 1
= Reset or NMI Exception ?
= Performance Counter
Status.BEV 1 = Debug or SIO
= NMI Exception ? Offset 0x100 Offset 0x80
Reset Status.BEM 0 Config.DIE/ICE/DCE 0 Config.NBE/BPE 0 Random 47 Wired 0 PCCR.CTE 0 BPC.IAE/DRC/DWE 0
= 0 (normal) Staus.DEV
= 1 (bootstrap)
PC 0xBFC0 0000
PC 0x8000 0000+Offset
PC 0xBFC0 0200+Offset
Figure 5-2. Level 2 Exception processing flowchart
5-6
Chapter 5 Exception Processing and Reset
5.2 Exception Vector Locations
Exception vector addresses for level 1 exceptions are shown in Table 5-2. The vector address for TLB refill depends on the Status.EXL bit. The vector addresses for level 1 exceptions also depend on the Status.BEV bit.
Table 5-2. Exception Vectors for Level 1 exceptions Exceptions TLB Refill (EXL = 0) TLB Refill (EXL = 1) Interrupt Others Vector Address BEV = 0 BEV = 1 0x8000 0000 0xBFC0 0200 0x8000 0180 0xBFC0 0380 0x8000 0200 0xBFC0 0400 0x8000 0180 0xBFC0 0380
Exception vector addresses for level 2 exceptions are shown in Table 5-3. The vector addresses for level 2 exceptions also depend on the Status.DEV bit.
Table 5-3. Exception Vectors for Level 2 exceptions Exceptions Reset, NMI Performance Counter Debug, SIO Vector Address DEV = 0 DEV = 1 0xBFC0 0000 0xBFC0 0000 0x8000 0080 0xBFC0 0280 0x8000 0100 0xBFC0 0300
5-7
Chapter 5 Exception Processing and Reset
5.3 Cause Register Setting
The Cause.ExcCode bits are set when a level 1 exception is taken. The Cause.ExcCode setting is shown in Table 5-4.
Table 5-4. Cause.ExcCode Field ExcCode 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16-31 Exception Int (Interrupt) Mod (TLB modification exception) TLBL (TLB exception; load or inst fetch) TLBS (TLB exception; store) AdEL (Address error exception; load or inst fetch) AdES (Address error exception; store) IBE (Bus error exception; instruction fetch) DBE (Bus error exception; load or store) Sys (Syscall exception) Bp (Breakpoint exception) RI (Reserved instruction exception) CpU (Coprocessor Unusable exeption) Ov (Integer Overflow exception) Tr (Trap exception) Reserved FPE (Floating Point Exception) Reserved
The Cause.EXC2 bits are set when a level 2 exception is taken. The Cause.EXC2 setting is shown in Table 5-5.
Table 5-5. Cause.EXC2 Field EXC2 0 1 2 3 4 5-7 Exception Res (Reset exception) NMI (Non-Maskable Interrupt) PerfC (Performance Counter exception) Dbg (Debug exception), SIO (SIO exception) SS (Single Step) Reserved
5-8
Chapter 5 Exception Processing and Reset
5.4 Masking an exception
The following exceptions can be masked by setting bits in Status register. NMI, Performance counter, Debug, Bus error, Interrupt and SIO The Table 5-6 shows whether the bits mask those exceptions. Exceptions which marked with "X" can be masked by setting (BEM, EXL or ERL) or clearing (IE or IM) the corresponding bit in the Status register.
Table 5-6. Masking exceptions Exception Reset NMI Performance Counter Debug SIO Address error TLB Refill/Invalid/Modify Bus error Syscall Break Reserved instrcution Coprocessor Unusable Interrupt Integer overflow Trap Mask bit (in Status register) IE IM BEM EXL ERL X X X X
X
X
X
X
X
X
X
5-9
Chapter 5 Exception Processing and Reset
5.5 Detaild Description
5.5.1 Exception Priority
Exception priority rules determine which exception is taken first, if multiple exceptions occur on the same instruction. The Table 5-7. Shows the priority order of the exceptions.
Table 5-7. Exception Priority Order Reset (highest priority) NMI Performance Counter Instruction Breakpoint (debug) Address error - Instruction fetch TLB refill - Instruction fetch TLB invalid - Instruction fetch Bus Error - Instruction fetch Single Step SYSCALL, BREAK, Reserved Instruction,* Floating Point Exception or Coprocessor Unusable* Interrupt Data address/value breakpoint (debug) SIO Integer overflow, Trap Address error - data access TLB refill - data access TLB invalid - data access TLB modified - data access Bus error - data access (lowest priority)
* The exception priority between Reserved Instruction exception(RI) and Coprocessor Unusable exception(CpU) The exception priorities of the two exceptions are the same. However, when Status.CU[1] = 0, an attempt to execute any FPU (COP1) instruction causes a CpU exception. When Status.CU[1] = 1, the attempt is reported as an FPE(E):unimplemented FPU exception in the Cop1 sub-instructions. On the other hand, an attempt to execute any COP0 class Reserved Instruction causes an RI exception regardless Status.CU[0].
5-10
Chapter 5 Exception Processing and Reset
5.5.2
Reset Exception
Cause The RESET exception occurs when the Reset* signal is asserted and then deasserted. This Reset exception is not maskable. Exception Level: 2 Vector Address: 0xBFC00000 Processing The RESET exception vector is located within uncached and unmapped address space. Hence the cache and TLB need not be initialized in order to process the exception. The contents of all registers in the CPU are undefined when this exception is recognized, except for the following register fields: * In the Status register, Status.ERL and Status.BEV are set to 1.
*
*
* * * * * Servicing
Status.BEM is set to 0. All other bits except for 0-fixed bits are undefined. In the Cause register, Cause.EXC2 is set to 0 (to indicate that a Reset occurred) All other bits except for 0-fixed bits are undefined. In the Config register, DIE, ICE, DCE, NBE, and BPE bits are set to 0. All other bits except for fixed-value, read-only bits are undefined. The Random register is initialized to the value of its upper bound (47). The Wired register is initialized to 0. The Counter Enable flag in the Performance Counter Control register (PCCR.CTE) is set to 0. The breakpoint address enable flags in the Breakpoint Control register, BPC.IAE, BPC.DRE, and BPC.DWE, are all set to 0. Valid, Dirty, LRF, and Lock bits of the data cache and the Valid and LRF bits of the instruction cache are initialized to 0 on reset.
The RESET exception is serviced by: * * * initializing all processor registers, coprocessor registers, caches, and the memory system performing diagnostic tests bootstrapping the operating system
5-11
Chapter 5 Exception Processing and Reset
5.5.3
Non-Maskable Interrupt (NMI) Exception
Cause The Non-Maskable Interrupt (NMI) exception occurs in response to the falling edge of the NMI* signal. The NMI exception is maskable by setting the Status.ERL bit. It is NMI recognized regardless of the settings of the Status.EXL, and Status.IE bits. Exception Level: 2 Vector Address: 0xBFC00000 Processing NMI and RESET exceptions share the same exception vector. This vector is located within uncached and unmapped address space; therefore, the cache and TLB need not be initialized in order to process the exception. When the NMI exception is recognized, all register contents are preserved with the following exceptions: * * * Servicing Note that the NMI service routine entry address does not depend on the Status.BEV flag. In fact, the Status.BEV bit is unconditionally set to 1 before the NMI handler is entered. It is up to the NMI service routine to restore the setting of the Status.BEV bit prior to exit.
ErrorEPC register, which contains the restart PC, and Cause.BD2 which records whether the NMI was recognized in a branch delay slot. Status.ERL and Status.BEV flags are both set to 1. Cause.EXC2 is set to 1 (NMI).
5-12
Chapter 5 Exception Processing and Reset
5.5.4
Performance Counter Exception
Cause A lower-case performance counter exception occurs when a Performance counter overflows and conditions are met as described in Section 9.3.2. This exception is maskable by setting Status.ERL bit. Exception Level: 2 Vector Address: 0x8000 0080 (DEV = 0), 0xBFC0 0280 (DEV = 1) Processing The value of Cause.EXC2 is set to 2 (PerfC). The ErrorEPC register contains the address of the instruction where the Performance counter exception was detected unless it is in a branch delay slot, in which case the ErrorEPC register contains the address of the preceding branch instruction and the Cause.BD2 is set. Servicing When this exception is recognized, control is transferred to the applicable service routine.
5-13
Chapter 5 Exception Processing and Reset
5.5.5
Debug Exception
Cause A DEBUG exception occurs whenever hardware breakpoint conditions as described in Chapter 13 are detected. This exception is maskable by setting Status.ERL bit. Exception Level: 2 Vector Address: 0x8000 0100 (DEV = 0), 0xBFC0 0300 (DEV = 1) Processing The value of Cause.EXC2 is set to 3 (Dbg). The ErrorEPC register contains the address of the instruction where the debug exception was detected unless it is in a branch delay slot, in which case the ErrorEPC register contains the address of the preceding branch instruction and Cause.BD2 is set. Note that the Load data value breakpoint exception is imprecise. That is, the instruction where the breakpoint is detected is not the load instruction that triggers the breakpoint; see Chapter 13 for more details. Servicing When this exception is recognized, control is transferred to the applicable service routine.
5-14
Chapter 5 Exception Processing and Reset
5.5.6
Address Error Exception
Cause The Address Error exception occurs when an attempt is made to execute one of the following: * * * * * load or store a doubleword that is not aligned on a doubleword boundary load, fetch, or store a word that is not aligned on a word boundary load or store a halfword that is not aligned on a halfword boundary reference the kernel address space from User or Supervisor mode reference the supervisor address space from User mode
This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 4 (AdEL) or 5 (AdES), depending on whether the exception was caused due to an instruction reference (AdEL), load operation (AdEL), or store operation (AdES). When this exception is recognized, the virtual address that was not properly aligned or that referenced protected address space is stored in the BadVAddr register. This update occurs even if the exception occurs within a level 1 or level 2 exception handler. The contents of the VPN field of the Context and EntryHi registers are undefined, as are the contents of the EntryLo register. The EPC register contains the address of the instruction that caused the exception, unless this instruction is in a branch delay slot. If it is in a branch delay slot, the EPC register contains the address of the preceding branch instruction and Cause.BD is set to indicate that the branch delay slot instruction actually caused the exception.
5-15
Chapter 5 Exception Processing and Reset
5.5.7
TLB Refill Exception
Cause The TLB refill exception occurs when there is no TLB entry to match a reference to a mapped address space. This exception is not maskable. Exception Level: 1 Vector Address: EXL = 0: 0x8000 0000 (BEV = 0), 0xBFC0 0200 (BEV = 1) EXL = 1: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to either a value of 2 (TLBL) or 3 (TLBS). This code indicates whether the exception was caused due to an instruction reference, load operation, or store operation. When this exception is recognized, the BadVAddr, Context and EntryHi registers are updated to hold the virtual address that failed address translation. The EntryHi register also contains the ASID for which the translation fault occurred. These actions take place even if the exception is recognized within a level 1 or level 2 exception handler. The Random register normally contains a valid location in which to place the replacement TLB entry. The contents of the EntryLo register are undefined. The EPC register contains the address of the instruction that caused the exception, unless this instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and Cause.BD is set. The EPC register and BD bit in the Cause register point to the address of the instruction causing the exception. Servicing To service this exception, the contents of the Context register are used as a virtual address to fetch memory locations containing the physical page frame and access control bits for a pair of TLB entries. The two entries are placed into the EntryLo0/EntryLo1 register; the EntryHi and EntryLo registers are then written into the TLB. It is possible that the virtual address used to obtain the physical address and access control information is on a page that is not resident in the TLB. This condition is processed by allowing a TLB refill exception in the TLB refill handler. This second exception goes to the common exception vector because the EXL bit of the Status register is set.
5-16
Chapter 5 Exception Processing and Reset
5.5.8
TLB Invalid Exception
Cause The TLB invalid exception occurs when a virtual address reference matches a TLB entry that is marked invalid (TLB valid bit cleared). This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to either 2 (TLBL) or 3 (TLBS). This code indicates whether the exception was caused due to an instruction reference, load operation, or store operation. When this exception is recognized, the BadVAddr, Context, and EntryHi registers are loaded with the virtual address that failed address translation. The EntryHi register also contains the ASID for which the translation fault occurred. These actions occur even if the exception is recognized within a level 1 or level 2 exception handler. The Random register normally contains a valid location in which to put the replacement TLB entry. The contents of the EntryLo register is undefined. The EPC register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set. Servicing A TLB entry is typically marked invalid when one of the following is true: * * * a virtual address does not exist the virtual address exists, but is not in main memory (a page fault) a trap is desired on any reference to the page (for example, to maintain a reference bit)
After servicing the cause of a TLB Invalid exception, the TLB entry is located with TLBP (TLB Probe), and replaced by an entry with that entry's Valid bit set.
5-17
Chapter 5 Exception Processing and Reset
5.5.9
TLB Modified Exception
Cause The TLB modified exception occurs when a store operation generates a virtual address that matches a TLB entry that is marked valid but is not dirty and therefore is not writable. This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 1 (Mod) and the BadVAddr, Context, and EntryHi registers contain the virtual address that failed address translation. The EntryHi register also contains the ASID for which the translation fault occurred. These actions occur even if the exception is recognized within a level 1 or level 2 exception handler. The contents of the EntryLo register is undefined. The EPC register contains the address of the instruction that caused the exception unless that instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set. Servicing The kernel uses the failed virtual address or virtual page number to identify the corresponding access control information. The page identified may or may not permit write accesses; if writes are not permitted, a write protection violation occurs. If write accesses are permitted, the page frame is marked dirty/writable by the kernel in its own data structures. The TLBP instruction places the index of the TLB entry that must be altered into the Index register. The EntryLo register is loaded with a word containing the physical page frame and access control bits (with the D bit set), and the EntryHi and EntryLo registers are written into the TLB.
5-18
Chapter 5 Exception Processing and Reset
5.5.10 Bus Error Exception
Cause A Bus Error exception is raised when BUSERR* signal is asserted during bus transactions. This exception is masked when Status.BEM, Status.EXL or Status.ERL are set to 1. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 6 (IBE) or 7 (DBE), indicating whether the exception was caused due to an instruction reference (IBE), load operation (DBE), or store operation (DBE). The BadPAddr is set to the physical address which caused a bus error when Status.BEM bit is 0. The EPC register and BD bit in the Cause register point to the address of the instruction currently being executed by the processor. Note that there is no necessary relationship between a bus error and the instruction being executed currently. For example, a bus error may be caused by instruction prefetch, or by a data cache line operation that is unrelated to any instruction. Furthermore, it could be caused by a load or store that was issued several instructions prior to the instruction that was executing when the bus error was recognized. If a bus error is caused by a load or store instruction, the instruction is retired. If the instruction is a store, the nature of how memory is updated depends on the memory subsystem's design. If the instruction is a load, the value loaded into the destination register is indeterminate. If a data value breakpoint is pending for the memory address accessed, breakpoint recognition is implementation dependent. Servicing In the C790 the bus error exception is imprecise and as such difficult to recover from and continue processing. If a bus error occurs during instruction or data cache refills, the cache line loaded has undefined values in it. Since it is not possible in general to determine the offending address (from the EPC) the entire data and instruction cache contents should be invalidated by using Index Invalidate suboperation of the CACHE instruction. (See the CACHE instruction's definition for details on how to do this.)
5-19
Chapter 5 Exception Processing and Reset
5.5.11 System Call Exception
Cause A SYSCALL exception occurs as a result of executing the SYSCALL instruction. This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 8 (Sys). The EPC register contains the address of the SYSCALL instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and Cause.BD is set. Servicing When this exception is recognized, control is transferred to the applicable system routine. To resume execution, the EPC register must be altered so that the SYSCALL instruction does not re-execute; this is accomplished by adding a value of 4 to the EPC register (EPC register + 4) before returning. If a SYSCALL instruction is in a branch delay slot, a more complicated algorithm, beyond the scope of this description, may be required.
5-20
Chapter 5 Exception Processing and Reset
5.5.12 BREAK Instruction Exception
Cause A BREAK exception occurs as a result of executing the BREAK instruction. This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 9 (Bp). The EPC register contains the address of the BREAK instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and Cause.BD is set. Servicing When a BREAK exception is recognized, control is transferred to the applicable system routine. Additional distinctions can be made by analyzing the unused bits of the BREAK instruction (bits 25:6), and loading the contents of the instruction whose address the EPC register contains. A value of 4 must be added to the contents of the EPC register (EPC register + 4) to locate the instruction if it resides in a branch delay slot. To resume execution, the EPC register must be altered so that the BREAK instruction does not re-execute; this is accomplished by adding a value of 4 to the EPC register (EPC register + 4) before returning. If a BREAK instruction is in a branch delay slot, interpretation of the branch instruction is required to resume execution.
5-21
Chapter 5 Exception Processing and Reset
5.5.13 Reserved Instruction Exception
Cause The Reserved Instruction exception occurs when one of the following conditions occurs: * * * * * an attempt is made to execute an instruction with an undefined major opcode (bits 31:26) an attempt is made to execute a SPECIAL instruction with an undefined minor opcode (bits 5:0) an attempt is made to execute a REGIMM instruction with an undefined minor opcode (bits 20:16) an attempt is made to execute a MMI instruction with an undefined minor opcode (bits 10:0) an attempt is made to execute a COPz instruction with an undefined minor opcode (bits 25:21)
Note: In the C790, 64-bit operations are always valid in User, Supervisor, and Kernel mode. This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 10 (RI). The EPC register contains the address of the reserved instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction.
5-22
Chapter 5 Exception Processing and Reset
5.5.14 Coprocessor Unusable Exception
Cause The Coprocessor Unusable exception occurs when an attempt is made to execute a coprocessor instruction for either: * * a corresponding coprocessor unit that has not been marked usable via the Status.Cu[ ] bits or COP0 instructions, when the unit has been marked not usable and the process executes in either User or Supervisor mode.
NOTE: COP0 instructions always execute in Kernel mode, regardless of the setting of Status.CU[0]. Also note that the operation of the COP0 instructions EI and DI is not controlled by Status.CU[0]. Instead, the Status.EDI bit specifies whether the EI and DI instructions execute in User and Supervisor modes. In case execution is suppressed, EI and DI behave as no-operations in User and Supervisor modes; they do not signal an exception.
The exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 11 (CpU) and the field Cause.CE (Coprocessor Usage Error) is set to indicate which of the four coprocessors was referenced. The EPC register contains the address of the unusable coprocessor instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction. Servicing The coprocessor unit to which an attempted reference was made is identified by the CE (Coprocessor Usage Error) field, which result in one of the following situations: * * * If the process is entitled access to the coprocessor, the coprocessor is marked usable and the corresponding user state is restored to the coprocessor. If the process is entitled access to the coprocessor, but the coprocessor does not exist or has failed, interpretation of the coprocessor instruction is possible. If the BD bit is set in the Cause register, the branch instruction must be interpreted; then the coprocessor instruction can be emulated and execution resumed with the EPC register advanced past the coprocessor instruction.
5-23
Chapter 5 Exception Processing and Reset
5.5.15 Interrupt Exception
Cause The Interrupt exception occurs when one of the three interrupt signals is asserted. The significance of the interrupts is dependent upon the specific system implementation. Each of the three interrupts can be masked by clearing the corresponding bit in the IntMask field of the Status register, and all of the three interrupts can be masked at once by clearing the IE bit or EIE bit of the Status register. All three interrupts are also masked at once when the EXL or ERL bit of the Status register is set to 1. Interrupt IP[7] is set when the Count register is equal to the Compare register. Exception Level: 1 Vector Address: 0x8000 0200 (BEV = 0), 0xBFC0 0400 (BEV = 1) Processing The value of Cause.ExcCode is set to 0 (Int). The IP field of the Cause register indicates current interrupt requests. It is possible that more than one of the bits can be simultaneously set (or even no bits may be set) if the interrupt is asserted and then deasserted before this register is read. Servicing If the interrupt is hardware-generated, the interrupt condition is cleared by correcting the condition causing the interrupt pin to be asserted. Due to the on-chip write buffer, a store to an external device (possibly clearing the interrupt) may not occur until after other instructions in the pipeline finish. Hence, the user must ensure that the store will occur before the return from exception instruction (ERET) is executed. This can be insured by executing a SYNC instruction. Otherwise the interrupt may be serviced again even though there is no actual interrupt pending.
5-24
Chapter 5 Exception Processing and Reset
5.5.16 SIO Exception
Cause The SIO exception occurs when the SIOInt signal is asserted. This exception is maskable by setting Status.ERL bit. Exception Level: 2 Vector Address: 0x8000 0100 (DEV = 0), 0xBFC0 0300 (DEV = 1) Processing The value of Cause.EXC2 is set to 3(Dbg). The Cause.SIOP is set to 1. The ErrorEPC register contains the address of the instruction where the SIO exception was detected unless if is in a branch delay slot, in which case the ErrorEPC register contains the address of the preceding branch insruction and Cause.BD2 is set. Servicing When this exception is recognized, control is transferred to the applicable service routine.
5-25
Chapter 5 Exception Processing and Reset
5.5.17 Integer Overflow Exception
Cause An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD, DADDI or DSUB instruction results in a 2's complement overflow. This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 12 (Ov). The EPC register contains the address of the instruction that caused the exception unless the instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set.
5-26
Chapter 5 Exception Processing and Reset
5.5.18 Trap Exception
Cause The TRAP exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE, TGEI, TGEIU, TLTI, TLTIU, TEQI, or TNEI instruction results in a TRUE condition. This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The value of Cause.ExcCode is set to 13 (Tr). The EPC register contains the address of the instruction causing the exception unless the instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and Cause.BD is set.
5-27
Chapter 5 Exception Processing and Reset
5.5.19 Floating-Point Exception
Cause The Floating-Point exception is used by the floating-point coprocessor. This exception is not maskable. Exception Level: 1 Vector Address: 0x8000 0180 (BEV = 0), 0xBFC0 0380 (BEV = 1) Processing The common exception vector is used for this exception, and the FPE code in Cause register is set. The contents of the Floating-Point Control/Status register indicate the cause of this exception. This exception is cleared by clearing the appropriate bit in the Floating-Point Control/Status register. For an unimplemented instruction exception, the kernel should emulate the instruction; for other exceptions, the kernel should pass the exception to the user program that caused the exception.
5-28
Chapter 6 Memory Management
6. Memory Management
The C790 processor provides a memory management unit (MMU) which uses an on-chip translation look-aside buffer (TLB) to translate virtual addresses into physical addresses. The C790 supports the MIPS compatible 32-bit address and 64-bit data mode. Only 32-bit virtual and physical addresses have been implemented. There is no requirement for address sign extension and address error exception checking will not be done on the "upper" 32-bits (which are ignored). The only condition that will generate the address error exception will be address alignment errors and segment protection errors. In Kernel mode, there will be address error exception free program counter wrap-around from kseg3 to kuseg. Since there is only one addressing mode, all the four MIPS ISAs (I, II, III, IV) and the C790 specific ISA are available without any restrictions in all of the three processor modes (with the appropriate MIPS ISA coprocessor usable restrictions). As such the reserved instruction (RI) exception will occur only when the processor really tries to execute an undefined opcode. This chapter describes the processor virtual and physical address spaces, the virtual-tophysical address translation, the operation of the TLB in making these translations, and those System Control Coprocessor (COP0) registers that provide the software interface to the TLB.
6-1
Chapter 6 Memory Management
6.1 Translation Look-aside Buffer (TLB)
Mapped virtual addresses are translated into physical addresses using an on-chip TLB. The TLB is a fully associative memory that holds 48 entries, which provide mapping to 48 odd / even page pairs (96 pages). When address mapping is indicated, each TLB entry is checked simultaneously for a match with the virtual address that is extended with an ASID stored in the low 8 bits of the EntryHi register. The address mapped to a page ranges in size from 4 KB to 16 MB, in multiples of four; that is, 4K, 16K, 64K, 256K, 1M, 4M, 16M.
6.1.1
Translation Status
In C790 processor, as the one implemented in R4000, each TLB entry holds two sets of mapping information for two odd/even page pair and therefore the translation result is categorized into three states, hit, miss and invalid. Upon address translation, if there is no virtual address match in all 48 entries, the translation result is categorized as TLB miss. In this case, an exception is taken and software refills the TLB from the page table resident in memory. Software can write over a selected TLB entry or use a hardware mechanism to write into a random entry. If there is a match on translation, the following takes place in the TLB hardware. 1. The translation information for odd page and even page is read out of the matching entry. Also the page size is extracted at the same time. 2. The TLB selects either of translation information in accordance with the page size information extracted above and the virtual address. This becomes the translation result in the TLB. The translation result includes a valid flag to indicate the translation information is valid or not. If the flag is marked as `valid', the translation is handled as TLB hit. The physical page number is extracted from the TLB and concatenated with the offset to form the physical address (see Figure 6-1). If the flag is marked as `invalid', the translation result is recognized as TLB invalid. In this case, an exception is taken to request the software to update the entry that got a match upon translation, by probing the TLB using TLBP operation.
6.1.2
Multiple Matches
Multiple match is the condition that there are two or more entries that match upon address translation. This is strictly prohibited and software is expected never to allow this to occur. The C790 processor does NOT provide any meanings to detect this in hardware, such as TLB shutdown. The result of this condition is undefined and the further execution may provide incorrect result.
6-2
Chapter 6 Memory Management
6.2 Address Spaces
This section describes the virtual and physical address spaces and the manner in which virtual addresses are converted or "translated" into physical addresses in the TLB.
6.2.1
Virtual Address Space
The C790 only implements 32 bits of virtual address space. There is no requirement for address sign extension and no checking will be done on the upper 32 bits of the address. Figure 6-1 shows the translation of a virtual address into a physical address.
Virtual address
1. Virtual address (VA) represented by the virtual page number (VPN) is concatenated with the ASID and compared with the tags in the TLB. ASID
VPN
Offset
2. If there is a match, the page frame number (PFN) representing the upper bits of the physical address (PA) is output from the TLB.
G
ASID
VPN
PFN
TLB Entry
TLB
4. The Offset, which does not pass through the TLB, is then concatenated to the PFN.
PFN
Offset Physical address
Figure 6-1. Overview of a Virtual-to-Physical Address Translation
As shown in Figure 6-2, the virtual address is extended with an 8-bit address space identifier (ASID), which reduces the frequency of TLB flushing when switching contexts. This 8-bit ASID is in the COP0 EntryHi register as described later in this chapter.
6-3
Chapter 6 Memory Management
6.2.2
Physical Address Space
Using a 32-bit address, the processor physical address space encompasses 4 GB. The following section describes the translation of a virtual address to a physical address.
6.2.3
Virtual-to-Physical Address Translation
Converting a virtual address to a physical address begins by comparing the virtual address from the processor with the virtual addresses in the TLB; there is a match when the virtual page number (VPN) of the address is the same as the VPN field of the entry, and either: * * the Global (G) bit of the TLB entry is set, or the ASID field of the virtual address (taken from the 8-bit ASID field of the EntryHi register) is the same as the ASID field of the TLB entry.
If there is no match, a TLB Miss exception is taken by the processor and software can refill the TLB from a page table of virtual / physical addresses in memory. If there is a virtual address match in the TLB, the physical address is output from the TLB and concatenated with the Offset, which represents an address within the page frame space. The Offset does not pass through the TLB. At the same time, the valid bit output from TLB is checked to qualify the translation. If this bit is not set, a TLB Invalid exception is taken by the processor and software can update the TLB. Virtual-to-physical translation is described in greater detail throughout the remainder of this chapter. Figure 6-9, shown at the end of this chapter, is a detailed flow diagram of this process.
6-4
Chapter 6 Memory Management
6.2.4
32-bit Address Translation Mode
The C790 supports only 32-bit address translation mode. 64-bit addressing mode is not supported. Figure 6-2 shows the virtual-to-physical address translation of a 32-bit address. * * The top portion of Figure 6-2 shows a virtual address with a 12-bit, or 4-KB, page size, labeled Offset. The remaining 20 bits of the address represent the VPN, and index the 1M-entry page table. The bottom portion of Figure 6-2 shows a virtual address with a 24-bit, or 16MB, page size, labeled Offset. The remaining 8 bits of the address represent the VPN, and index the 256-entry page table.
Virtual Address with 1M (2 ) 4-Kbyte pages 39 ASID 8 32 31 29 28 VPN 20 12 11 Offset 12 0
20
Bits 31, 30 and 29 of the virtual address select user, supervisor, or kernel address spaces.
Virtual-to-physical translation in TLB
TLB 32-bit Physical Address 31 PFN
Virtual-to-physical translation in TLB
Offset passed unchanged to physical memory
0 Offset
Offset passed unchanged to physical memory
TLB
39 ASID 8
32 31 29 28 VPN 8
24 23 Offset 24
8
0
Virtual Address with 256 (2 ) 16-Mbyte pages Figure 6-2. 32-bit Mode Virtual Address Translation
6-5
Chapter 6 Memory Management
6.2.5
Operating Modes
* * * User mode Supervisor mode Kernel mode
The processor has the three standard MIPS operating modes:
Selection between the three modes can be made by the operating system (when in Kernel mode) by writing into Status register's KSU field. The processor is forced into Kernel mode when the processor is handling a Level 1 exception (the EXL bit is set - also called the Exception Level mode in R-series processors) or a Level 2 exception (the ERL bit is set - also called the Error Level mode in R-series processors). In the following table, dashes represent `don't cares'.
Table 6-1 Processor Modes Description
32-bit User mode 32-bit Supervisor mode 32-bit Kernel mode 32-bit Kernel mode (Level 1 exception) 32-bit Kernel mode (Level 2 exception)
KSU
10 01 00 -
ERL
0 0 0 0 1
EXL
0 0 0 1 -
Figure 6-3 shows a state transition among these three modes.
Exception
User Mode
ERET & KSU =10 Exception
Kernel Mode
ERET & KSU = 01
Supervisor Mode
Figure 6-3 State Transition among Operating Modes
6-6
Chapter 6 Memory Management Table 6-2 summarizes address space for each operating mode.
Table 6-2. Address Space Virtual Address 0xFFFF FFFF to 0xE000 0000 0xDFFF FFFF to 0xC000 0000 0xBFFF FFFF to 0xA000 0000 0x9FFF FFFF to 0x8000 0000 32-bit User Mode 32-bit Supervisor Mode Address Error sseg (0.5 GB) Mapped 32-bit Kernel Mode kseg3 (0.5 GB) Mapped ksseg (0.5 GB) Mapped kseg1 (0.5 GB) Unmapped* Uncached kseg0 (0.5 GB) Unmapped* Cached**
Address Error
Address Error
0x7FFF FFFF to 0x0000 0000
useg (2 GB) Mapped
suseg (2 GB) Mapped
kuseg (2 GB) Mapped (becomes unmapped if ERL is 1)
*Note: Virtual addresses of Kernel segments, kseg0 and kseg1, are not mapped through the TLB and always translated into physical addresses from 0x0000 0000 to 0x1FFF FFFF. ** Note: The kseg0 cache algorithm is controlled by the K0 field in the Config register.
6-7
Chapter 6 Memory Management
6.2.6
User Mode Operations
In User mode, a single, uniform virtual address space, labeled User segment, is available; its size is: * 2 GB (231 bytes) (useg) Figure 6-4 shows User mode virtual address space.
Virtual Address
0x FFFF FFFF
32-bit Address Error
0x 8000 0000
2 GB Mapped
0x 0000 0000
useg
Figure 6-4. User Mode Virtual Address Space
The User segment starts at address 0x0000 0000 and the current active user process resides in useg. The TLB identically maps all references to useg from all modes, and controls cache accessibility. The processor operates in User mode when the Status register contains the following bitvalues: * * *
KSU bits = 102 and EXL = 0 and ERL = 0
6-8
Chapter 6 Memory Management Table 6-3 lists the characteristics of the User mode segment, useg .
Table 6-3. User Mode Segments Address Bit Values
A[31] = 0
Status Register Bit Values KSU EXL ERL
102 0 0
Segment Name
useg
Virtual Address Range
0x0000 0000 through 0x7FFF FFFF
Segment Size
2 Gbyte (231 bytes)
Space(useg) User Mode, User Space(useg) useg In User mode(KSU = 102 in the Status register), when the most-significant bit of the 32bit virtual address is set to 0, the useg virtual address space is selected; it covers the 231 bytes (2 GB) of the current user address space. All valid User mode virtual addresses have their most-significant bit cleared to 0; any attempt to reference an address with the mostsignificant bit set while in User mode causes an Address Error exception. The system maps all references to useg through the TLB. Bit settings within the TLB entry for the page determine the cacheability of a reference. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. This mapped space starts at virtual address 0x0000 0000 and runs through 0x7FFF FFFF.
6-9
Chapter 6 Memory Management
6.2.7
Supervisor Mode Operations
Supervisor mode is designed for layered operating systems in which a true kernel runs in C790 Kernel mode, and the rest of the operating system runs in Supervisor mode. The processor operates in Supervisor mode when the Status register contains the following bit-values: * * *
KSU = 012 and EXL = 0 and ERL = 0
Virtual Address
0x FFFF FFFF 0x E000 0000 0x C000 0000 0x A000 0000 0x 8000 0000
32-bit Address error 0.5 GB Mapped Address error Address error
sseg
2 GB Mapped
0x 0000 0000
suseg
Figure 6-5. Supervisor Mode Virtual Address Space Table 6-4. Supervisor Mode Segments Address Bit Values
A[31] = 0 A[31:29] = 1102
Status Register Bit Values KSU EXL ERL
012 012 0 0 0 0
Segment Name
suseg sseg
Virtual Address Range
0x0000 0000 through 0x7FFF FFFF 0xC000 0000 through 0xDFFF FFFF
Segment Size
2 Gbyte (231 bytes) 0.5 Gbyte (229 bytes)
(suseg) Supervisor Mode, User Space (suseg) suseg In Supervisor mode (KSU = 012 in the Status register), when the most-significant bit of the 32-bit virtual address is set to 0, the suseg virtual address space is selected; it covers the 231 bytes (2 Gbytes) of the current user address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. This mapped space starts at virtual address 0x0000 0000 and runs through 0x7FFF FFFF. (sseg) Supervisor Mode, Supervisor Space (sseg) sseg In Supervisor mode (KSU = 012 in the Status register), when the three most-significant bits of the 32-bit virtual address are 1102, the sseg virtual address space is selected; it covers 229-bytes (512 Mbytes) of the current supervisor address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. This mapped space begins at virtual address 0xC000 0000 and runs through 0xDFFF FFFF. 6-10
Chapter 6 Memory Management
6.2.8
Kernel Mode Operations
The processor operates in Kernel mode when the Status register contains one of the following values: * * *
KSU = 002 or EXL = 1 or ERL = 1
The processor enters Kernel mode whenever an exception is detected and it remains in Kernel mode until an Exception Return (ERET) instruction is executed. The ERET instruction restores the processor to the mode existing prior to the exception. Kernel mode virtual address space is divided into regions differentiated by the high-order bits of the virtual address, as shown in Figure 6-6. Table 6-5 lists the characteristics of the kernel mode segments.
Virtual Address 32-bit
0x FFFF FFFF 0x FFFF FFFF
Physical Address 32-bit
kseg3 Translated by TLB
0.5 GB Mapped
0x E000 0000
0.5 GB Mapped
0x C000 0000
ksseg
Translated by TLB
0x A000 0000
0.5 GB Unmapped Uncached 0.5 GB Unmapped Cached
kseg1
kseg0
0x 8000 0000
2 GB Mapped (becomes unmapped if ERL=1)
kuseg Translated by TLB
0x 1FFF FFFF
0.5 GB Kernel Boot and I/O
0x 0000 0000
0x 0000 0000
Figure 6-6. Kernel Mode Address Space
6-11
Chapter 6 Memory Management
Table 6-5. Kernel Mode Segments Address Bit Values
A[31] = 0 A[31:29] = 1002 A[31:29] = 1012 A[31:29] = 1102 A[31:29] = 1112
Status Register Bit Values KSU EXL ERL
KSU = 002 or EXL = 1 or ERL = 1
Segment Name
kuseg kseg0 kseg1 ksseg kseg3
Virtual Address Range
0x0000 0000 through 0x7FFF FFFF 0x8000 0000 through 0x9FFF FFFF 0xA000 0000 through 0xBFFF FFFF 0xC000 0000 through 0xDFFF FFFF 0xE000 0000 through 0xFFFF FFFF
Segment Size
2 Gbyte (231 bytes) 0.5 Gbyte (229 bytes) 0.5 Gbyte (229 bytes) 0.5 Gbyte (229 bytes) 0.5 Gbyte (229 bytes)
(kuseg) Kernel Mode, User Space (kuseg) kuseg In Kernel mode (KSU = 002 or EXL = 1 or ERL = 1 in the Status register), when the mostsignificant bit of the virtual address, A[31], is a 0, the 32-bit kuseg virtual address space is selected; it covers the full 231 bytes (2 GB) of the current user address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. When ERL = 1 in the Status register, the user address, kuseg, region becomes a 231-byte unmapped, uncached address space (that is, mapped directly to physical addresses 0x0000 0000 through 0x7FFF FFFF). (kseg0) Kernel Mode, Kernel Space 0 (kseg0) kseg0 In Kernel mode (KSU = 002 or EXL = 1 or ERL = 1 in the Status register), when the mostsignificant three bits of the virtual address are 1002, 32-bit kseg0 virtual address space is selected; it is the 229-byte (512 MB) kernel physical space. References to kseg0 are not mapped through the TLB; the physical address selected is defined by subtracting 0x8000 0000 from the virtual address. The K0 field of the Config register, described in this chapter, controls cacheability and coherency. (kseg1) kseg1 Kernel Mode, Kernel Space 1 (kseg1) In Kernel mode (KSU = 002 or EXL = 1 or ERL = 1 in the Status register), when the mostsignificant three bits of the 32-bit virtual address are 1012, 32-bit kseg1 virtual address space is selected; it is the 229-byte (512 MB) kernel physical space. References to kseg1 are not mapped through the TLB; the physical address selected is defined by subtracting 0xA000 0000 from the virtual address. Caches are disabled for accesses to these addresses, and physical memory (or memorymapped I/O device registers) is accessed directly. (ksseg) Kernel Mode, Supervisor Space (ksseg) ksseg In Kernel mode (KSU = 002 in the Status register), when the most-significant three bits of the 32-bit virtual address are 1102, the ksseg virtual address space is selected; it is the current 229-byte (512 MB) supervisor virtual space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.
6-12
Chapter 6 Memory Management (kseg3) Kernel Mode, Kernel Space 3 (kseg3) kseg3 In Kernel mode (KSU = 002 in the Status register), when the most-significant three bits of the 32-bit virtual address are 1112, the kseg3 virtual address space is selected; it is the current 229-byte (512 MB) kernel virtual space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.
6-13
Chapter 6 Memory Management
6.3 System Control Coprocessor
The System Control Coprocessor (COP0) is implemented as an integral part of the CPU, and supports memory management, address translation, exception handling, and other privileged operations. The COP0 registers shown in Figure 6-7 plus a 48-entry TLB make up the MMU. Each COP0 register has a unique number that identifies it; this number is referred to as the register number. For instance, the PageMask register is register number 5.
EntryLo0 2* EntryLo1 3* Index 0* Random 1* PageMask 5* TLB Wired 6* Status 12* Context 4* BadVAddr 8*
EntryHi 10*
47
("Safe" entries) (See Random register, contents of TLB Wired) 0 127 0 *Register number Figure 6-7. COP0 Registers and the TLB
6-14
Chapter 6 Memory Management
6.3.1
Format of a TLB Entry
Figure 6-8 shows the TLB entry formats for the 32-bit address translation modes. Each field of an entry has a corresponding field in the EntryHi, EntryLo0, EntryLo1, or PageMask registers. For example, the Mask field of the TLB entry is also held in the PageMask register.
32-bit Mode
127 121 120 109 108 96
0
7 95 128-bit TLB entry in 32bit mode of C790 processor 63 58 57 19
MASK
12 77 76 75 72 71
0
13 64
VPN2
G
1
0
4
ASID
8 38 37 35 34 33 32
0
6 31 26 25
PFN
20 65
C
3
DV0
111 32 1 0
0
6
PFN
20
C
3
DV0
111
Figure 6-8. Format of a TLB Entry
The format of the EntryHi, EntryLo, EntryLo1, and PageMask registers are nearly the same as the TLB entry. The one exception is the Global field (G bit), which is used in the TLB, but is reserved in the EntryHi register. The following register tables describe the TLB entry fields shown in Figure 6-8.
6-15
Chapter 6 Memory Management
PageMask Register
31 25 24 13 12 0
0
7
MASK
12
0
13
MASK 0
Page comparison mask. Reserved. Must be written as zeroes, and returns zeroes when read.
EntryHI Register
31 13 12 8 7 0
VPN2
19
0
5
ASID
8
VPN2 ASID 0
Virtual page number divided by two (maps to two pages). Address space ID field. An 8-bit field that lets multiple processes share the TLB; each process has a distinct mapping of otherwise identical virtual page numbers. Reserved. Must be written as zeroes, and returns zeroes when read.
EntryLo0 Register
31 26 25 6 5 3 2 1 0
0
6
PFN
20
C
3
D
1
V
1
G
1
EntryLo1 Register
31 26 25 6 5 3 2 1 0
0
6
PFN
20
C
3
D
1
V
1
G
1
PFN C D V G 0
Page frame number; the upper bits of the physical address. Specifies the TLB page coherency attribute; see Table 6-7. Dirty. If this bit is set, the page is marked as dirty and, therefore, writable. This bit is actually a write-protect bit that software can use to prevent alteration of data. Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLB invalid exception occurs. Global. If this bit is set in both LO0 and LO1, then the processor ignores the ASID during TLB lookup. Reserved. Must be written as zeroes, and returns zeroes when read.
The TLB page coherency attribute (C) bits specify whether references to the page should be either of cached, uncached, or uncache-accelerated. Table 6-6 shows the coherency attributes selected by the C bits.
6-16
Chapter 6 Memory Management
Table 6-6 TLB Page Coherency (C) Bit Values C[5:3] Value 0 1 2 3 4 5 6 7 Page Coherency Attribute Reserved Reserved Uncached Cacheable, write-back, write-allocate Reserved Reserved Reserved Uncached, Accelerated
Write-back with allocate fetches the line with the missed data both on load misses and on store misses. Therefore, storing data to such pages is always performed to the data cache and will not be sent to the write buffer. Uncached accelerated data provides a special kind of acceleration for handling uncached data. On a load of an uncached accelerated data item (which can range in size from a byte to a quadword) the C790 will always fetch an aligned 128-byte quantity from memory. These eight quadwords will be placed in a special 128-byte buffer called the uncache accelerated buffer, or UCAB in the CPU. Any subsequent loads which "hit" the UCAB will get the data from the UCAB. This process reduces bus traffic. The UCAB will be invalidated under the following conditions: * * * * Any load operation which doesn't hit the buffer, or any store operation, or a SYNC (or SYNC.L) operation, or any exception.
For uncached accelerated stores, the C790 write-back buffer (128-bit x 8) also has some special features. On the first store of an uncached accelerated write the write-back buffer will mark the fact that this is an uncached accelerated write to a particular address. Subsequent uncached accelerated stores which hit within the same 128-bit address boundary will be accumulated (gathered) within the same write buffer entry. This process of data gathering reduces bus traffic. The gathering process will be terminated under the following conditions: * * * * Any store which can't be gathered (different attribute or different address), or any load operation, or a SYNC (or SYNC.L) operation, or any exception.
6-17
Chapter 6 Memory Management
6.4 Virtual-to-Physical Address Translation Process
In the supported 32-bit mode, the highest 8 to 20 bits of the virtual address (depending upon the page size) are compared to the contents of the TLB virtual page number. The 8bit ASID is only compared if the global bit, G, is not set. If a TLB entry matches, the physical address and access control bits (C, D, and V) are retrieved from the matching TLB entry. While the V bit of the entry must be set for a valid translation to take place, it is not involved in the determination of a matching TLB entry. Figure 6-9 illustrates the TLB address translation process.
6-18
Chapter 6 Memory Management
Virtual Address (Input) For valid address space, see the section describing Operating Modes in this chapter. Address Error Exception VPN and ASID
No Access Yes User Allowed? Mode Yes
No
Sup. Yes Access No Mode Allowed? No Yes
Address Error Exception
Unmapped Access
No
Mapped Area? Yes
VPN No Match? Yes
G=1? Yes
No
ASID Match? Yes
No
Match
Not Match
Match?
No match entry
V=1?
No
Yes Yes Write? No Dirty D = 1? Yes No TLB Mod Exception Yes
C =010 or 111?
Noncacheable No
TLB Invalid
TLB Refill
Exception
Access Main Memory Physical Address (Output)
Access Cache
Figure 6-9. TLB Address Translation
6-19
Chapter 6 Memory Management If there is no TLB entry that matches the virtual address, a TLB miss exception occurs. If the access control bits (D and V) indicate that the access is not valid, a TLB modified or TLB invalid exception occurs. If the C bits equal 0102 (Uncached) or 1112 (Uncached Accelerated), the physical address that is generated directly accesses main memory, bypassing the cache.
6.5 TLB Instructions
Table 6-7 lists the instructions that the CPU provides for working with the TLB. See Appendix C for a detailed description on these instructions.
Table 6-7. TLB Instructions OpCode TLBP TLBR TLBWI TLBWR Description of Instruction Translation Look-aside Buffer Probe Translation Look-aside Buffer Read Translation Look-aside Buffer Write Index Translation Look-aside Buffer Write Random
6-20
Chapter 7 Caches
7. Caches
The C790 core contains both an instruction cache and a separate data cache. The processor also contains a small size of read only cache memory for uncached accelerated area. This chapter describes the cache structures, operation of the caches, and cache control.
7-1
Chapter 7 Caches
7.1 Cache Features
The two caches are configured as shown in Table 7-1:
Table 7-1. Cache Configuration Cache Instruction Cache Data Cache Size 32 KB 32 KB Organization 2-Way 2-Way Line Size 64 bytes 64 bytes Refill Size 64 bytes 64 bytes
The following are the main features of the caches: * * * * * * * * * * * Separate Instruction Cache and Data Cache Virtually indexed and physically tagged caches 64 Byte line size 64 Byte Refill size 2-way set-associative cache for higher performance Write-back policy for the Data Cache Missed quadword first sequential order burst refills for the Data Cache Data Cache line locking Non-Blocking Loads Data cache supports multiple Hits under a single miss No Snoop capability
No cache snoop capability has been provided. The user may choose to use CACHE instructions to keep coherency between caches and main memory.
7-2
Chapter 7 Caches
7.2 Organization of the Caches
Organization of the caches is illustrated in Figure 7-1 and Figure 7-2. Both the Instruction Cache and the Data Cacher are 2-way set-associative. Each cache line consists of a tag and data. Each cache has a data line size of 64 bytes. data
7.2.1
Data Cache
The Data Cache is connected to the CPU via a 128-bit bus. Therefore, the Data Cache can supply to the CPU or the coprocessors up to a quadword of data per access. The following diagram shows Data Cache structure. Tags are discussed in detail in a later section.
Phys.Tag0 Data0 Phys.Tag1 Data1
LRVD Virtual Index
PFN 20 bits
DATA 64 bytes
LRVD
PFN 20 bits
DATA 64 bytes
256 entries
Way0
Way1
L R V D
Lock Bit For description, see Section 7.3.7, Data Cache Lock Function LRF Bit For description, see Section 7.3.1, Line Replacement Algorithm Valid Bit For description, see Section 7.2.3, Tag Structure Dirty Bit For description, see Section 7.2.3, Tag Structure
Figure 7-1. Organization of Data Cache
7-3
Chapter 7 Caches
7.2.2
Instruction Cache
The Instruction Cache is connected to the CPU pipeline via a 64-bit bus. This enables the CPU to fetch two instructions per cycle from the Instruction Cache. The following diagram shows Instruction Cache structure. Tags are discussed in detail in a later section.
Phys.Tag0 Data0 Phys.Tag1 Data1
RV Virtual Index
PFN 20 bits
DATA 64 bytes
RV
PFN 20 bits
DATA 64 bytes
256 entries
Way0
Way1
R V
LRF Bit Valid Bit
Figure 7-2. Organization of Instruction Cache
7-4
Chapter 7 Caches
7.2.3
Tag Structure
The general structure of a tag consists of a set of state bits and a physical page frame number or PFN field. The Data Cache and the Instruction Cache have different numbers of state bits; for more information, refer to the discussions in the following sections. The size of the tag and the number of virtual address bits indexing the caches are dependent upon the size of the cache, address space, and set associativity. The C790 supports 32-bit virtual and physical addresses as shown in the figure below:
Virtual Address (VA) 31 VPN 14 13 12 11 OFFSET 0
Physical Address (PA) 31 PFN 14 13 12 11 OFFSET 0
Since the cache line size is fixed at 64 bytes, that is, four quadwords per entry, the Tag Cache associated with each way will have one tag for every four quadwords. Table 7-2 shows cache sizes, address bits and tag size.
Table 7-2. Cache Size and Access Bits Cache Size Way Size of Each Way 256 x 64 Bytes 256 x 64 Bytes Cache Virtual Address Index Bits 13:4 13:3 Tag Cache Size of Each Way 256 x 20 Bits 256 x 20 Bits Tag Virtual Address Index 13:6 13:6
Data Instruction
32 K 32 K
2 WAY 2 WAY
While the caches are indexed by the virtual address, the tag comparison is physical. This is possible because the caches and the TLB are accessed in parallel. So, when the tags have been accessed, the page frame number is ready to be compared against the translated virtual address for a cache hit or miss. C790 Programming Note: Overlapping of the cache index bit range and PFN bit range causes the "cache aliasing problem". C790 does not have any hardware mechanisms to detect the cache aliasing. It is programmer's responsibility to avoid the cache aliasing. When a physical page is mapped on the different virtual pages, VPN[13:12] have to be same in both virtual address. The conservative way to avoid this is that VPN[13:12] == PFN[13:12] whenever a page is mapped.
7-5
Chapter 7 Caches
7.2.3.1
Data Cache Tag Structure
In addition to the physical page frame number (PFN), each Data Cache Tag entry also contains additional Cache State bits as shown below. All lines in both ways of the Data Cache have these four state bits. Cache line state bits are also illustrated in Figure 7-1.
Data Cache Tag Fields Dirty (D) Valid (V) LRF (R) Lock (L) PFN
Two state bits, DIRTY and VALID, together identify which of three states the Data Cache is in: Valid Clean, Valid Dirty, or Invalid. Table 7-3 shows the state of the Data Cache line as a function of DIRTY and VALID bits.
Table 7-3. Data Cache Line States Dirty Bit (D) X 0 1 Valid Bit (V) 0 1 1 Cache Line State Invalid Valid Clean Valid Dirty Even if Cache Instruction try to set V = 0, D = 1 state, Dirty bit is forced to zero in C790 implementation.
The LRF bit is the Least-Recently-Filled line replacement bit. The LRF bits serve as a replacement algorithm between the two ways of the Data Cache. A refill access to a cache line in a way will flip the LRF bit to point to the other way as the least recently filled. For details of the LRF line update operation refer to Section 7.3.1. As Figure 7-1 illustrates, Data Cache lines in each way have a LOCK bit. The LOCK bit, as explained in Section 7.3.7, Data Cache Lock Function, locks lines in one of the ways to keep data from being replaced. 7.2.3.2 Instruction Cache Tag Structure
In addition to the physical page frame number (PFN), each Instruction Cache Tag entry also contains two additional Cache State bits as shown below. All lines in both ways of the Instruction Cache have these two state bits.
Instruction Cache Tag Fields Valid (V) LRF (R) PFN
The Instruction Cache VALID state bit defines whether each line is in the Valid or Invalid states. The LRF bit is the Least-Recently-Filled line replacement bit. LRF bits serve as a replacement algorithm between the two ways of the Instruction Cache. A refill access to a cache line in a way will flip the LRF bit to point to the other way as the least recently filled. For details of LRF line update operation refer to Section 7.3.1.
7-6
Chapter 7 Caches
7.2.4
State of Cache Tags After Reset
* * * * * * Valid Dirty LRF Lock Valid LRF
For all Data Cache tags the following fields are initialized to 0 upon reset:
For all Instruction Cache tags the following fields are initialized to 0 upon reset:
All other fields in the Instruction Cache and the Data Cache contents are undefined upon reset.
7-7
Chapter 7 Caches
7.3 Cache Operations
This section describes cache operation in regard to read/write policies, coherency, writeback policy, and the lock function.
7.3.1
Line Replacement Algorithm
The line replacement policy for both the Instruction Cache and the Data Cache is based on the Least Recently Filled (LRF) algorithm. In this policy, the LRF bit of a way is modified (inverted) only when a cache line refill occurs to the corresponding way. Load/store accesses to the Data Cache do not modify the LRF bit. The bit indicating which way is the least recently filled way is the XOR of the two LRF bits of the two ways of the cache.
Table 7-4. LRF Line Replacement Algorithm Current Way0 LRF Current Way1 LRF XOR Refill Way New Way0 LRF New Way1 LRF
0 1 1 0
0 0 1 1
0 1 0 1
0 1 0 1
1 1 0 0
0 1 1 0
The column under XOR indicates the way which could be refilled (line replaced) on the next refill at that line location. Note that the table shown above is valid only when none of the ways of the cache line is locked. If a way of the cache line is locked, then regardless of the state of the LRF bits, the least recently filled way will always be the unlocked way. The behavior is also slightly different for Instruction and Data Caches when one of the way is invalid. For the Data Cache the algorithm is followed exactly as given above irrespective of the ways being valid or invalid. For the Instruction Cache the algorithm given above is followed as long as both the ways are valid. Once a way becomes invalid, then that way gets priority of being filled over the valid way irrespective of the LRF bits.
7.3.2
Non-blocking Loads and Hit Under Miss
The Data Cache supports non-blocking load and hit under miss to improve performance. When a Data Cache miss occurs or an uncached load instruction is issued, Non-blocking load allows the pipeline to continue instruction execution until one of the following occurs: 1. A subsequent non-load/store/pref instruction has data dependency with the load that is pending (to be retired). 2. A pipeline0 stalls.
7-8
Chapter 7 Caches
Hit under miss is a feature that allows access (load or store) to the Data Cache while a previous load miss (cached, uncached or uncached accelerated), a previous store miss (cached) or a previous prefetch miss (cached) is still pending. In this case, access to the cache proceeds and the pipe does not stall.
Uncached loads also do not stall the pipeline while they are pending (to be retired). The pipeline continues instruction execution until one of the following occurs: 1. A subsequent load/store/pref instruction has data dependency with the load that is pending (to be retired). 2. A Data Cache miss occurs or a miss occurs on the Uncached Accelerated Buffer. 3. An Uncached load instruction is issued. To summarize, Non-blocking load and Hit under miss allow the pipelene to continue instruction execution until one of following occurs when a Data Cache miss occurs or an uncached load instruction is issued: 1. A subsequent instruction has data dependency with the load that is pending (to be retired). 2. A Data Cache miss occurs or a miss occurs on the Uncached Accelerated Buffer. 3. An uncached load instruction is issued. 4. A pipeline0 stalls. Loads to the GPRs (IU) and FPRs (FPU) all follow the non-blocking protocol (when it is enabled). Loads to COP1 is always blocking.
7.3.3
Cache Miss and Hit Operations
In case of a Data Cache hit, the cache provides data to the CPU in 128-bit (single quadword) quantities. In case of an Instruction Cache hit, the cache provides data ("instruction") in 64-bit quantities. CPU reads or writes to the Data Cache in quantities less than 128 bits are specified by the least significant four bits of the address, bits 3:0. Cache misses are processed by the cache controller in 64-byte quantities - one cache line. Since the caches are connected to the system bus via a 128-bit bus, cache refill takes a burst of 4 bus cycles (8 CPU cycles) that is, four quadwords are transferred in 4 bus cycles (actual transfer time can be more due to bus arbitration etc). These reads are performed in sequential order for both the Instruction Cache and the Data Cache. The quadword for which the address missed is always fetched first. Table 7-5 indicates the sequential order. PA[5:4] are two least-significant address bits that are put out on the CPU Bus. Figure 7-3 illustrates the case where the second quadword, shaded area, missed and shows the order in which data are read from main memory.
7-9
Chapter 7 Caches
Table 7-5. Quadword Retrieved Address PA[5:4] Bus Cycle
1 2 3 4
Starting Block Address PA[5:4] 00 01 10 11
00 01 10 11 01 10 11 00 10 11 00 01 11 00 01 10
Read order
128 bits 11 Third
128 bits 10 Second
128 bits 01 First
128 bits 00 Fourth
Figure 7-3. Read Missed Processed in Sequential Order
In case of a write miss to the Data Cache (for an allocate-on-write address), the cache controller will read in sequential order a cache line from main memory. Whether the cache line, being replaced, is first written out to memory or not - due to the DIRTY bit being set is discussed in the next section. The Instruction Cache processes cache misses in burst of 4 quadwords, just like the Data Cache. Furthermore, in case of an Instruction Cache miss, the pipeline starts in the same cycle the final quadword is stored into the Instruction Cache.
7.3.4
Data Cache Writeback Policy
1. The processor executes Index Write Back Invalidate CACHE instruction suboperation as defined in Appendix C and the line data are dirty. Or Hit Writeback Invalidate or Hit Writeback without Invalidate CACHE suboperations hit on Data Cache and the line data are dirty. 2. A read or write miss occurs and the line data are dirty. In this case the line has to be written to memory before it can be replaced by the miss data.
Data cache lines are written back to the memory in the following cases:
7-10
Chapter 7 Caches
7.3.5
Data Cache State Transitions
As discussed previously, lines in the Data Cache can be in one of several states: Invalid, Invalid Valid Clean or Valid Dirty. Dirty
Invalid means the Data Cache entry does not contain valid data. Upon a miss, the cache can load data into this cache line with no further actions.
The Valid Clean state indicates that there are valid data in the Data Cache line and they are the same as memory. All writeback segments have their data in the Valid Clean state until they are written to by the processor. The C790 supports the write-back protocol, hence the need for a Valid Dirty state. A Data Cache line transitions to the Valid Dirty state when the cache line is written to without reflecting the operation on the bus - the writeback protocol. In this case, the data in the cache does not match the data in memory. Figure 7-4 shows the transition diagram of the Data Cache performing according to the writeback policy. For details on the CACHE operation, refer to Appendix C.
CACHE Index Invalidate CACHE Index WriteBack Invalidate CACHE Hit WriteBack Invalidate (if hit) CACHE Hit Invalidate (if hit) CACHE Index Store Tag (if V = 0) Reset Invalid Valid Clean
CPU Read
CPU Write Valid Dirty CPU Write CPU Read
Read Miss PREF Miss CACHE Index Store Tag (if V = 1, D = 0) CACHE Hit W/B without Invalidate (if hit)
Write Miss CACHE Index Store Tag (if V = 1, D = 1)
Figure 7-4. Data Cache Transition Diagram, Writeback Protocol
7-11
Chapter 7 Caches
7.3.6
Instruction Cache State Transitions
Cache lines in the Instruction Cache can be in either of two states: Invalid or Valid. Valid
Invalid means the Instruction Cache entry does not contain valid instruction data. Upon a miss, the cache can load instructions into this cache line with no further actions.
The Valid state indicates that there are valid instructions in the cache line and so there is no need for miss processing. The transition diagram for the Instruction Cache is simple; refer to Figure 7-5. details on the CACHE instructions refer to Appendix C.
CPU Read
For
INVALID
VALID
CACHE Index Store Tag (if V = 0) CACHE Index Invalidate Reset
CACHE Hit Invalidate (if hit)
CACHE Index Store Tag (if V = 1) CPU Read Miss CACHE Fill
Figure 7-5. Instruction Cache Transition Diagram
7.3.7
Data Cache Lock Function
In a 2-way set-associative Data Cache, such as the one present in the C790, there is no explicit way of forcing data to be retained in the cache. The LRF-based mechanism dynamically determines which cache line should be replaced. A Data Cache lock function has been defined to aid in retaining critical pieces of data in the Data Cache under strict program control. Each entry on each way of the Data Cache has a Lock (L) bit. The Lock bit aids in locking the line by writing directly into it. After locking the line, the LRF bit is no longer meaningful. Thus, if one of the ways for a particular line is locked, the other way is the only way available for caching. Thus, once a line is locked with a particular physical address tag, any other virtual address which maps onto the same cache line will have only a direct mapped location rather than a 2-way location. To lock the Data Cache, the following two CACHE instruction suboperations can be used:
INDEX STORE TAG (DCACHE) INDEX STORE DATA (DCACHE)
For details of the above CACHE instruction suboperation refer to Section 7.6. To lock a Data Cache line, the following code sequence can be used:
7-12
Chapter 7 Caches li mtc0 sync.l cache sync.l la sw t0,0x00010068 //PTagLo = 0x00010, D=V=L=1, R=0 t0,$28 //t0 -> TagLo 18,0(r0) //TagLo -> Tag(way0)
s0,0x00010000 t1,0(s0) //store contents of t1 into //locked cache line
In this example, the tag has been modified using the CACHE instruction and the data has been updated using a Store instruction. The following restrictions apply to line locking: * * The result of re-locking a locked line is undefined The results of locking both ways of a cache line are undefined
To unlock Data Cache lines, the following code sequence can be used: li mtc0 sync.l cache sync.l 7.3.7.1 t0,0x00010060 t0,$28 18,0(r0) //D=V=1, L=R=0 //t0 -> TagLo //TagLo -> Tag(way0)
Operations During Lock
When the lock bit is set for cache line (index), only the other way is available for handling cache misses. The misses are blocking. A write access to a locked line in the Data Cache takes place only to the cache without affecting the state of memory. Writes to locked cache lines will not set the DIRTY (D) bit.
7.3.8
Relationship Between Cached and Uncached Operations
Uncached and Uncached Accelerated load and store operations are always executed in order on the CPU bus. Cached load operations can precede earlier store data present in buffers on the CPU bus. All store data present in buffers prevents a SYNC (or SYNC.L) instruction from completing until the store data has been sent either to the Data Cache or the CPU bus. Stores with the uncached and uncached accelerated attributes bypass the Data Cache completely.
7-13
Chapter 7 Caches
7.4 Uncached Accelerated Buffer
The C790 has a small size of read only cache memory for uncached accelerated area to reduce bus traffic. This read only cache, the Uncached Accelerated Buffer (UCAB), can introduce data to itself only by refill process due to a load miss on the UCAB. Once load instructions hit on the UCAB, data are provided directly from the UCAB. The UCAB is invalidated under the following conditions: * * * * Any load operation which doesn't hit the UCAB, or Any store operation, or A SYNC (or SYNC.L) operation, or Any exception
Snoop is not supported for the UCAB.
7.4.1
UCAB Configuration
The UCAB is configured as shown in Table 7-6.
Table 7-6. UCAB Configuration Size
Uncached Accelerated Buffer 128 bytes
Organization
Direct Map
Line Size
128 bytes
Refill Size
128 bytes
7.4.2
Tag Structure
The UCAB is also indexed by the virtual address, the tag comparison is physical. Table 7-7 shows the UCAB size and access bits.
Table 7-7. UCAB Size and Access Bits Size
UCAB 128 B
Way
Direct Map
Size
1x128 Bytes
UCAB Virtual Index Bits
6:4
UCAB Tag Size
1x25 Bits
UCAB Tag Virtual Index Bits
The least significant 5 bits of the UCAB Tag ([11:7]) is identical with the virtual address [11:7]. The UCAB Tag has one bit of valid bit. The UCAB Tag doesn't have Ditty, LRF, Lock bits. The valid bit of UCAB Tag is initialized to 0 upon reset.
7.4.3
Non-blocking Loads and HiT under Miss
The UCAB also supports non-blocking load and hit under miss as well as the Data Cache. Non-blocking load and Hit under miss allow the pipeline to continue instruction execution until one of following occurs when an Uncached Accelerated Buffer miss occurs: 1. A subsequent instruction has data dependency with the load that is pending (to be retired). 2. A Data cache miss occurs or a miss occurs on the UCAB. 3. An uncached load instruction is issued. 4. A pipeline0 stalls.
7-14
Chapter 7 Caches
7.5 Cache Control Registers
The operations of the caches are controlled by certain programmable bits in the Config register. These bits are: ICE DCE IC DC IB DB Instruction Cache Enable Data Cache Enable Instruction Cache Size Data Cache Size Icache Line Size Dcache Line Size
For details of these configuration bits refer to the COP0 register section. The two cache tag registers TagLo and TagHi are 32-bit read/write registers that hold the tag and state of the cache line during initialization and diagnostics. The Tag registers are manipulated by MTC0 and CACHE instructions.
TagLo
31 12 11 7 6 5 4 3 2 0
PTagLo
TagHi
0
D
V
R
L
0
where PTagLo D V R L 0 Specifies physical address bits 31:12 Cache State DIRTY bit (Not used for the Instruction Cache) Cache State VALID bit LRF Bit LOCK Bit (Not used for the Instruction Cache) Must be written as zeros, will return zero on reads
The TagHi register contains instruction- and operation-specific items (see the next section).
7-15
Chapter 7 Caches
7.6 CACHE Instruction
For information on the CACHE instruction, please refer to Appendix C.
7-16
Chapter 8 CPU Bus
8.
CPU Bus
The C790 CPU core is connected to the rest of the system1, and to external devices, through the group of on-chip C790 system bus signals called the CPU Bus. This chapter Bus defines the architecture of the CPU Bus and describes it in the context of an overall system design. This chapter describes the following: * * * the CPU Bus architecture and agents on the CPU Bus the types of transactions possible between agents on the bus the bus protocols for transactions
1
The system consists of a DMA Controller (DMAC) as a master, and various slave devices.
8-1
Chapter 8 CPU Bus
8.1 Introduction
The CPU Bus is an on-chip bus in a highly integrated processor. All agents (see definitions section 8.1.1 below) on the CPU Bus are equipped with a CPU Bus interface unit connected via CPU Bus signals. An agent acts like a master when it initiates reads or writes on the bus. An agent acts like a slave when it responds to reads or writes initiated by a master. For the CPU Bus to operate properly, an arbiter is needed, to perform arbitration between the CPU and the other bus masters. The arbiter is located in the CPU, and CPU arbitration behavior is discussed in Section 8.5.1, Arbitration Operations. The following are main features of the CPU Bus: * Separate data and address buses (Demultiplexed operation) * 128-bit data bus * Clocked synchronous operations * Peak transfer rate of 2.1GB/sec (@133 MHz bus clock) * 8/16/32/64/128-bit and burst accesses * Multimaster capability * Pipelined operations * No turn-around or dead cycles between transfers The CPU Bus does not provide: * Cache coherency support * Split transactions
8-2
Chapter 8 CPU Bus
8.1.1
Terminology
Address Phase is the cycles during which an address is driven on the CPU Bus through the cycle the address is acknowledged. Agent refers to different devices on the CPU Bus. Assert means taking a signal to its active level. An active high signal is "1" when asserted, and an active low signal is "0" when asserted. CPU means the C790 CPU. The terms CPU and C790 are used interchangeably in this chapter. Data Phase is the cycles during which data are driven on the bus through the cycle they are acknowledged. DMAC is the DMA Controller in the system. Master means the current bus master on the CPU Bus. MEM refers to the system memory controller. Negate/Deassert Negate/Deassert means taking a signal to its inactive state. An active high signal is "0" when deasserted. An active low signal is "1" when negated. * (after signal name) means active low signal.
8.1.2
Signal Naming Convention
Table 8-1 shows the prefixes used for naming signals in a system incorporating the C790 CPU Bus.
Table 8-1. System Signal Naming Convention Signal Prefix CPU Signal Type Signals from the CPU multiplexed or logically combined with the DMAC signals to form the system signals. These signals include: CPUADDR, CPUBE*, CPURD*, CPUWR*, CPUTSIZE, CPUASTART*, CPUDSTART*, CPUDATA. The combined or multiplexed signals from any agents on the CPU Bus. These signals include: SYSADDR, SYSBE*, SYSRD*, SYSWR*, SYSTSIZE, SYSASTART*, SYSDSTART*, SYSAACK*, SYSDACK*, SYSDATA.
SYS
8-3
Chapter 8 CPU Bus
8.2 CPU Bus Architecture
The CPU Bus design is a synchronous pipelined bus with separate data (128-bit) and address buses running at half the clock frequency of the CPU. The CPU is connected to the rest of the system and external devices through this bus. Figure 8-1 illustrates the architecture of the bus and identifies different agents that can be on the bus.
CPU D$ CPU Bus Interface WBB CPU Bus DMAC
I$
Memory Controller I/O Devices
Figure 8-1. CPU Bus Architecture
8-4
Chapter 8 CPU Bus
8.2.1
CPU Bus Connectivity for Address and Control Paths
Figure 8-2 illustrates the system-level interconnections for address paths of the CPU Bus. Support logic is needed to handle the fact that the system contains multiple masters. AGNT* is used to control the multiplexer in the support logic that selects a master to be connected to the CPU Bus.
CPUASTART * DMAASTART * SYSASTART *
AGNT* BUSCLK
DQ
C790 CPU
SYSADDR, SYSBE*, SYSTSIZE, SYSRD*, SYSWR*
CPUADDR, CPUBE*, CPUTSIZE, CPURD*, CPUWR*
Memory Controller
DMAADDR, DMATSIZE, DMARD*, DMAWR*
Mux
DMAC
DMAAACK* SYSAACK* MEMAACK*
I/O Devices
IOAACK*
Figure 8-2. CPU Bus Address and Control Path Connections in System
8-5
Chapter 8 CPU Bus
8.2.2
CPU Bus Connectivity for Data Paths
Figure 8-3 illustrates the system-level interconnections for data paths of the CPU Bus. For read cycles, the support logic must control the multiplexer so that the correct source of data is put on SYSDATA. For write cycles, the support logic must detect whether the cycle is a CPU cycle or a DMA cycle, and use this to control the multiplexer.
CPUDSTART* DMADSTART*
SYSDSTART*
C790 CPU
CPUDATA SYSDATA
Mux
Memory Controller
DMADATA
MEMDATA
DMAC
IODATA
DMADACK* SYSDACK* MEMDACK* IODACK*
I/O Devices
Figure 8-3. CPU Bus Data Path Connections in System
8-6
Chapter 8 CPU Bus
8.3 CPU Bus Signal Descriptions
This section describes the CPU Bus signals and their usage in different bus operations.
8.3.1
Address Bus Signals
CPUADDR[31:4]
CPU address bus
CPUADDR[31:4] bits are valid during the address phase and can be sampled by the slave when CPUASTART* is sampled low.
SYSADDR[31:4]
System address bus
SYSADDR[31:4] are multiplexed outputs selecting between CPUADDR[31:4] and DMA address. They are valid during the address phase and can be sampled by the slave when SYSASTART* is sampled low.
CPUBE[15:0]*
CPU byte enables
CPUBE[i]*, driven during the address phase, indicates valid data on byte i of i CPUDATA[127:0] during the data phase. CPU byte enables can be sampled by the slave when CPUASTART* is sampled low. CPU byte enables are used only in CPU single cycles.
SYSBE[15:0]*
System byte enables
SYSBE[i]*, driven during the address phase, indicates valid data on byte i of i SYSDATA[127:0] during the data phase. System byte enables can be sampled by the slave when SYSASTART* is sampled low. System byte enables are used only in CPU single cycles.
8-7
Chapter 8 CPU Bus
CPUTRANSTYPE[4:0]
CPU transaction type
CPUTRANSTYPE[4:0], driven during the address phase, indicates the type of operation. CPU transaction type can be sampled by the slave when CPUASTART* is sampled low.
Table 8-2. Bus Transaction Types CPUTRANSTYPE
00000 00001 - 00111 01000 01001 01010 01011 01100 01101 - 01111 10000 10001 10010 10011 - 10111 11000 11001 11010 11011 11100 11101 - 11111
Type of Bus Transaction
Not defined or miscellaneous Reserved Data Cache Refill due to Load Miss Data Cache Refill due to Prefetch Instruction Data Cache Refill due to Store Miss Uncached Load Uncached Accelerated Load Reserved Instruction Cache Miss Refill Cache Instruction - Fill Suboperation Uncached Execution Reserved Data Cache Write-back due to Load/Store Miss Data Cache Write-back due to Cache Instruction Uncached Store Uncached Accelerated Store Non-allocated Store Reserved
CPURD*
CPU read
The CPU asserts this signal to indicate a read operation. This signal can be sampled when CPUASTART* is sampled low. This signal is active during the address phase. CPURD* is used in transfers initiated by the CPU.
CPUWR*
CPU write
The CPU asserts this signal to indicate a write operation. This signal can be sampled when CPUASTART* is sampled low. This signal is active during the address phase. CPUWR* is used in transfers initiated by the CPU.
8-8
Chapter 8 CPU Bus
CPUTSIZE[1:0]
CPU transfer size
While driven by the CPU, these signals indicate the size of the transfer in the current CPU initiated bus cycle. They are driven during the address phase and can be sampled starting at the edge where CPUASTART* is sampled low.
Table 8-3. CPU Transfer Size CPUTSIZE[1:0]
00 11
Transfer Size
1 Quadword (Single Cycle) 4 Quadwords
SYSTSIZE[2:0]
System transfer size
While driven by the system, these signals indicate the size of the transfer in the current system bus cycle. They are driven during the address phase and can be sampled starting at the edge where SYSASTART* is sampled low.
CPUASTART*
CPU address start
Driven by the CPU, it indicates the start of the address phase. Address, byte enable, and control signals (CPUADDR[31:4], CPUBE[15:0]*, CPURD*, CPUWR*, and CPUTSIZE) can be sampled to determine the type of cycle requested starting where CPUASTART* is sampled low. CPUASTART* is driven active for only one cycle.
SYSASTART*
System address start
SYSASTART* is driven by the system; it indicates the start of the address phase. Address, byte enable, and control signals can be sampled to determine the type of cycle requested starting where SYSASTART* is sampled low. SYSASTART* is driven active for only one cycle.
SYSAACK*
System address acknowledge
This signal is an input to all the agents on the CPU Bus indicating that address and control signals have been sampled by the slave. The master terminates the address phase one cycle after sampling SYSAACK* low.
CPUDATA[127:0]
CPU data bus
This is a 128-bit data bus output from the CPU.
SYSDATA[127:0]
System data bus
This is the 128-bit data bus input to all devices on the CPU Bus.
8-9
Chapter 8 CPU Bus
CPUDSTART*
CPU data start
During read/write operations, this output from the CPU indicates the start of data phase. For CPU write operations, the slave can sample data from the bus one cycle after CPUDSTART* has been asserted. For CPU read operations, the slave can output data on the bus any cycle after the cycle CPUDSTART* has been asserted.
SYSDSTART*
System data start
During read/write operations, this output from the system indicates the start of data phase. Data transfer can begin one cycle after SYSDSTART* has been asserted. For DMA cycles, if the slave, providing the data, cannot supply data in the next cycle after the assertion of SYSDSTART*, it is the responsibility of the designer to come up with a new DMA protocol.
SYSDACK*
System data acknowledge
This signal is an input to all the agents on the bus indicating the valid status of data on the bus. During read cycles, it indicates read data are available on the bus to be sampled by the master. During write cycles, it indicates the slave has sampled the data. This signal should be asserted for each data transfer during burst operations. During read transactions, data are sampled one cycle after SYSDACK* has been asserted. During write transactions, the master drives new data on the bus one cycle after detecting SYSDACK* low.
BUSERR*
Bus error
This signal is an input to the CPU and the DMAC which indicates that a bus error has occurred during the transaction. BUSERR* serves to terminate the bus protocol and return bus ownership to the CPU.
INT[1:0]*
Interrupt request lines
These signals are interrupt inputs to the CPU.
SIOINT*
Serial I/O interrupt request
This line provides the serial I/O interrupt from the I/O controller.
NMI*
Non-maskable interrupt
Non-maskable interrupt input to the CPU. SYSBIGENDIAN Big Endian enable
This input signal is sampled during cold reset and make CPU to operate as big endian when it is asserted. The input level of this signal must not be changed during the operation.
8-10
Chapter 8 CPU Bus
CPCOND0
Coprocessor conditions
These lines are an input to the CPU as test conditions for some of the branch instructions.
RESET*
Reset
Input to the CPU. When this line is asserted, the CPU, DMAC and slave devices execute a reset.
CPUCLK
CPU clock
CPU clock
BUSCLK
Bus clock
Bus clock: 1/2, 1/3 or 1/4 frequency of the CPUCLK.
AREQ*
Address bus request
This signal is an output from the DMAC to the CPU. When it is asserted, the DMAC requests the address bus mastership.
AGNT*
Address bus grant
This signal is an output from the CPU to grant the bus mastership to the DMAC. This signal is asserted in response to assertion of the AREQ* signal.
REL*
Bus release request
This signal is asserted by the CPU to request that the current bus owner release the CPU Bus.
8-11
Chapter 8 CPU Bus
8.4 Overview of CPU Bus Operations
This section discusses CPU Bus operations; it covers processor requests, DMA operations, and bus error operation. In this section descriptions show CPU signals followed by the system lines, in parentheses, onto which they are asserted. For example: CPUASTART* (SYSASTART*) means CPUASTART* is asserted on the SYSASTART* line. Where a value is given, the bits output by the CPU are shown, followed by the bits, in parentheses, on the system lines. For example if we have 11 on CPUTSIZE[1:0], during a CPU bus cycle, then we will get 011 on the SYSTSIZE[2:0]. This will be shown as 11 (011).
8.4.1
CPU Bus Operations
The CPU Bus is different from conventional buses in that it allows pipeline operations. In this case, pipeline implies up to two outstanding requests before any data transaction has taken place. For instance, the CPU may issue two back-to-back read requests to main memory before any data have been returned. Note that at any time, there can only be two outstanding requests on the bus. The master requiring more than two operations has to wait until the first request has been serviced completely prior to issuing the third one.
8.4.2
Processor Requests
The CPU issues single requests, burst requests or a series of requests to other agents on the bus. These requests are referred to as processor requests initiated through the CPU Bus interface. The processor requests are in response to the following system events: * * * * * Load miss Store miss Write-back buffer writes (dirty data cache lines, uncached writes, etc.) Uncached loads and uncached accelerated loads Instruction miss and uncached instruction fetch
Processor read/write requests can be a burst, quadword, or partial quadword of data to and from the main memory or any other system resources. A processor-initiated burst is always 4 quadwords. 8.4.2.1 Read Requests
The CPU initiates read requests by driving address and control on the bus and asserting CPUASTART* (SYSASTART*) to indicate valid address and control. The CPU will keep driving address and control until the slave device has acknowledged the address phase by asserting address acknowledge, SYSAACK*. For burst reads, the CPU drives CPUTSIZE (SYSTSIZE) to 11 (011) to indicate burst reads. The CPU also indicates that it is ready to accept read data by asserting CPUDSTART* (SYSDSTART*). The slave device returns the requested data on the data bus by asserting SYSDACK*, data acknowledge. ,
8-12
Chapter 8 CPU Bus
8.4.2.2
Write Requests
The CPU initiates write requests by driving address and control on the bus and asserting CPUASTART* (SYSASTART*). The CPU also drives data on the bus and indicates that by asserting CPUDSTART* (SYSDSTART*). The slave device accepts the address and data . by asserting SYSAACK* and SYSDACK*, respectively. Burst writes are indicated by driving CPUTSIZE (SYSTSIZE) to 11 (011) during the address phase.
8.4.3
Bus Error Operations
Bus error occurs when the CPU or DMA initiates cycles but there are no devices on the CPU Bus responding to the cycles. The absence of response to either the address phase or the data phase will cause the bus error condition. The bus error is always imprecise. When bus error occurs, all the agents including the CPU, DMAC, and slave devices on the CPU Bus will terminate the current bus cycle. In the case where CPU is the initiator of the cycle, there can be two types of bus error: * * Data load/store bus error Instruction fetch bus error
Bus error sets the corresponding exception bit in the CAUSE register. Subsequently, the CPU will jump to the proper error handler for the examination of the exception. However, the bus error exception is imprecise. There is no guarantee that the CPU can recover from this error condition. In case the DMAC is the initiator of the cycle, the types of bus error depends on the implementation of the DMAC. After bus error occurs, the DMAC will release the bus mastership back to the CPU and assert interrupt or NMI to the CPU. The interrupt or NMI routine will then handle the bus error condition for the DMAC.
8-13
Chapter 8 CPU Bus
8.5 CPU Bus Transaction Protocols and Timing
This section describes transaction protocols and the timing for the following CPU Bus operations: * * * * * * Arbitration CPU single operations (one quadword) CPU burst operations (four quadwords) CPU non-pipelined single operations (one quadword) CPU non-pipelined burst operations (four quadwords) Bus error operations
8.5.1
Arbitration Operations
An arbiter is required to mediate between devices requesting the CPU Bus. The arbiter is located in the CPU. The CPU is the default bus master; AREQ* and AGNT* are both deasserted during RESET. A master other than the CPU may request the bus by asserting the request signal, AREQ*. In response to the AREQ* signal, the CPU will issue the grant signal, AGNT*, to grant the address bus to the requesting master. In the cycle AGNT* is sampled active by the bus master, the master starts the address phases and deasserts AREQ* in the beginning of the last address phase. When the corresponding data phases commences, the CPU or the requesting master starts the data transfers depending on the DMA transfer. Data phases follow the exact order of address phases. The arbitration signals are shown in Figure 8-4.
AREQ*
CPU
AGNT* REL*
Bus Master
CPU Bus
Figure 8-4. Connection of Arbitration Signals
The arbitration priority in using the CPU Bus is that the DMAC always has higher priority than the CPU. When both the CPU and the DMAC arbitrate for the CPU Bus, the arbiter grants the bus mastership to the DMAC. The CPU can assert REL* to the DMAC in an effort to get the bus ownership back from the DMAC. The CPU will proceed with the transfer once the DMAC has released the CPU Bus. The arbitration cycles and protocol are shown in Figure 8-5. In response to the DMAC asserting its request AREQ*, the arbiter asserts AGNT* in cycle 3 which is the arbitration cycle. The DMAC samples AGNT* asserted and begins its address phases. When the DMAC asserts to begin the last address phase, it deasserts its request line AREQ* in cycle 4. The arbiter then waits for the SYSAACK* cycle to deassert AGNT* to release bus mastership back to the CPU.
8-14
Chapter 8 CPU Bus
1 BUSCLK AREQ* AGNT* SYSADDR SYSASTART* SYSAACK* CPU
2
3
4
5
6
7
8
9
CPU Master
Master CPU
CPU
Figure 8-5. Arbitration Protocol
8.5.1.1
Cycle Stealing
Cycle stealing refers to the CPU's ability to preempt a master in order to perform a bus operation. This operation could be either due to the write back buffer (WBB) being almost full (having more than 64 bytes filled up) or the CPU needing to perform an instruction or data read. These operations are collectively referred to as cycle stealing operations. Figure 8-6 illustrates the cycle stealing protocol. The arbiter asserts the REL* (Release) signal in response to the CPU's request cycles. The master deasserts its request after having finished its operations. When the master has begun the last address phase with the master deasserts the AREQ* signal indicating to the arbiter that the bus will be relinquished; as indicated in cycle 9. When the address phase ends, the address bus is returned to the CPU by the deassertion of AGNT* in cycle 12. The arbiter deasserts REL* at the same time AGNT* is deasserted. The data phases follow the same order as the address phases.
1 BUSCLK AREQ* AGNT* SYSADDR SYSASTART* SYSAACK* REL* CPU CPU Master Master's last address CPU CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Figure 8-6. Cycle Stealing Protocol
8-15
Chapter 8 CPU Bus
8.5.2
CPU Single Operations
CPU Single operations transfer one quadword. In single operations, the CPU drives the address, byte enables, and the read/write signals and indicates their valid status by asserting CPUASTART* (SYSASTART*). The slave samples valid address and control lines and responds by asserting SYSAACK*. In single operations, CPUTSIZE (SYSTSIZE) is always 00 (000). When the CPU detects SYSAACK* active and is ready to put another address on the bus, it will start another address phase. The bus only supports two levels of address pipelining. That means only two address phases can be outstanding before any data phase begins. The CPU indicates that it is ready to accept/supply data by asserting CPUDSTART* (SYSDSTART*) one cycle prior to actually accepting/supplying it. For read cycles, the slave supplies the data and indicates that the data is ready by asserting SYSDACK*. For write cycles, the CPU supplies data one cycle after CPUDSTART* (SYSDSTART*) is asserted, and the slave accepts the data by asserting SYSDACK*. 8.5.2.1 CPU Single Reads
The fastest CPU single read is 2 cycles. Address and data phases for AddrA illustrate the fastest CPU single read cycle. The CPU asserts CPUASTART* (SYSASTART*) to begin the address phase in cycle 1. The slave device asserts SYSAACK* in cycle 1 to indicate that it has sampled the address. The CPU then begin another address phase in cycle 3. The assertion of SYSDACK* by the slave device in cycle 1 triggers the CPU to sample SYSDATA at the end of cycle 2.
1 BUSCLK SYSADDR SYSDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 0 AddrA A 0 0 AddrB AddrC B C 0 AddrD D 2 3 4 5 6 7 8 9 10
Figure 8-7. CPU Single Reads
8-16
Chapter 8 CPU Bus
8.5.2.2
CPU Single Writes
The fastest CPU single write is 2 cycles. Address and data phases for AddrA illustrate the fastest CPU single write cycle. The CPU always drives data onto CPUDATA one cycle after the assertion of CPUDSTART* (SYSDSTART*). For example, in, the CPU drives CPUDATA in cycle 2 which is one cycle after the assertion of CPUDSTART* (SYSDSTART*) in cycle 1. The slave device samples SYSDATA one cycle after the assertion of SYSDACK*.
1 BUSCLK SYSADDR SYSDATA CPUDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 0 AddrA A A 0 AddrB B B 0 AddrC C C 0 AddrD D D 2 3 4 5 6 7 8 9 10
Figure 8-8. CPU Single Writes
8-17
Chapter 8 CPU Bus
8.5.2.3
CPU Single Read-Write-Read-Write Cycles
All adjacent address phases are read-write or write-read cycles. AddrA is a read address and AddrB is a write address, and so on.
1 BUSCLK SYSADDR SYSDATA CPUDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 0 0 AddrA AddrB A B B 0 0 AddrC C AddrD D D 0 AddrE 2 3 4 5 6 7 8 9 10
Figure 8-9. CPU Single Read-Write-Read-Write Cycles
8-18
Chapter 8 CPU Bus
8.5.3
CPU Burst Operations
CPU Burst operations transfer four quadwords. In burst operations, the CPU drives the address and control signals and indicates their validity by asserting CPUASTART* (SYSASTART*). The slave samples valid address and control lines and asserts SYSAACK* to acknowledge the address phase. The address phase is the cycles from CPUASTART* (SYSASTART*) asserted to one cycle after SYSAACK* is asserted. When the CPU detects SYSAACK* active and has another address ready, it will start another address phase. The CPU indicates that it is ready to accept/supply data by asserting CPUDSTART* (SYSDSTART*) one cycle prior to actually accepting/supplying it. For read cycles, the slave supplies the data and indicates that data are valid by asserting SYSDACK* one cycle prior to the data being available. For write cycles, the CPU supplies data one cycle after CPUDSTART* (SYSDSTART*) is asserted, and the slave accepts the data by asserting SYSDACK*. For burst cycles, there are many SYSDACK* for data transfer. The CPUTSIZE (SYSTSIZE) indicates the number of quadwords in the transfer. The CPU initiated cycles use only values of either 00 (for CPU Single operations) or 11 (for CPU Burst operations), which are single and burst of 4 quadwords respectively. 8.5.3.1 CPU Burst Reads
The fastest CPU burst read is 5 cycles. Address and data phases for AddrA illustrate the fastest CPU burst read cycle. There are four SYSDACK* sent by the slave device for every CPU burst read cycle. The slave device asserts SYSDACK* in cycle 1, 2, 3, and 4 to indicate that data can be sampled at the end of cycle 2, 3, 4, and 5 by the CPU.
1 BUSCLK SYSADDR SYSDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 3 AddrA A1 A2 AddrB A3 3 A4 B1 B2 3 AddrC B3 B4 3 AddrD 2 3 4 5 6 7 8 9 10
Figure 8-10. CPU Burst Reads
8-19
Chapter 8 CPU Bus
8.5.3.2
CPU Burst Writes
The fastest CPU burst write is 5 cycles. Address and data phases for AddrA illustrate the fastest CPU burst write cycle. After assertion of CPUDSTART* (SYSDSTART*) in cycle 1, the CPU drives the first data on CPUDATA in cycle 2. As SYSDACK* is sampled asserted in cycles 1, 2, 3, and 4, the CPU drives a new data on CPUDATA at the end of cycles 2, 3, 4, and 5.
1 BUSCLK SYSADDR SYSDATA CPUDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 3 AddrA A1 A1 A2 A2 AddrB A3 A3 3 A4 A4 B1 B1 B2 B2 3 AddrC B3 B3 B4 B4 AddrD C1 C1 3 2 3 4 5 6 7 8 9 10
Figure 8-11. CPU Burst Writes
8-20
Chapter 8 CPU Bus
8.5.3.3
CPU Burst Read-Write Cycles
All adjacent address phases are read-write or write-read cycles. AddrA is a read address and AddrB is a write address, and so on.
BUSCLK SYSADDR SYSDATA CPUDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 3 3 AddrA A1 A2 AddrB A3 A4 B1 B1 B2 B2 3 AddrC B3 B3 B4 B4 C1
Figure 8-12. CPU Burst Read-Write Cycles
8.5.3.4
CPU Burst Write-Read Cycles
All adjacent address phases are read-write or write-read cycles. AddrA is a write address and AddrB is a read address, and so on.
BUSCLK SYSADDR SYSDATA CPUDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 3 AddrA A1 A1 A2 A2 AddrB A3 A3 3 A4 A4 3 B1 B2 AddrC B3 B4 C1 C1
Figure 8-13. CPU Burst Write-Read Cycles
8-21
Chapter 8 CPU Bus
8.5.4
CPU Non-Pipeline Single Operations
The CPU Bus can support non-pipeline operations as well as pipeline operations. The non-pipeline operations are done simply by delaying the assertion of SYSAACK* until the last SYSDACK* of the bus transaction. The advantage of this is that the peripheral does not need to save the current address; it just decodes the address on the address bus for the current operation. Using this mode of operation simplifies the peripheral interfaces to the CPU Bus but it degrades the system performance. 8.5.4.1 CPU Non-Pipeline Single Reads
All adjacent address phases are read cycles.
1 BUSCLK SYSADDR SYSDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 0 AddrA A 0 AddrB B 0 AddrC C 2 3 4 5 6 7 8 9 10
Figure 8-14. CPU Non-Pipeline Single Reads
8-22
Chapter 8 CPU Bus
8.5.4.2
CPU Non-Pipeline Single Writes
All adjacent address phases are write cycles.
1 BUSCLK SYSADDR CPUDATA SYSDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* 0 AddrA A A 0 AddrB B B 0 AddrC C C 2 3 4 5 6 7 8 9 10
Figure 8-15. CPU Non-Pipeline Single Writes
8.5.5
8.5.5.1
CPU Non-Pipeline Burst Operations
CPU Non-Pipeline Burst Reads
All adjacent address phases are read cycles.
1 BUSCLK SYSADDR SYSDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* A1 AddrA A2 3 A3 A4 B1 AddrB B2 3 B3 B4 2 3 4 5 6 7 8 9 10
Figure 8-16. CPU Non-Pipeline Burst Reads
8-23
Chapter 8 CPU Bus
8.5.5.2
CPU Non-Pipeline Burst Writes
All adjacent address phases are write cycles.
1 BUSCLK SYSADDR CPUDATA SYSDATA SYSTSIZE SYSWR* SYSRD* SYSASTART* SYSAACK* SYSDSTART* SYSDACK* A1 A1 AddrA A2 A2 3 A3 A3 A4 A4 B1 B1 AddrB B2 B2 3 B3 B3 B4 B4 2 3 4 5 6 7 8 9 10
Figure 8-17. CPU Non-Pipeline Burst Writes
8-24
Chapter 8 CPU Bus
8.5.6
Bus Error Operations
Bus error occurs when there are no slave responding to the address or data phases of the bus cycle. When bus error occurs, the current bus operation is terminated, and the system proceeds with the next bus operation. Without bus error detection, the CPU Bus would remain waiting indefinitely for the SYSAACK* or SYSDACK* signals. Bus error is generated by the CPU Bus monitor logic. The monitor logic basically makes sure that for both address and data phases in the current CPU Bus cycle, there are SYSAACK* and SYSDACK*, respectively. In the case, when there is no SYSAACK* or SYSDACK* or response to the address or data phase for a pre-defined period of time for the current CPU Bus cycle, bus error is generated by asserting BUSERR* for one CPU Bus clock. Bus error has higher priority than SYSAACK* or SYSDACK* if they are detected in the same cycle. Bus error is always asserted in reference to the data phase of the cycle. The exact timing is the cycles from SYSDSTART* asserted to the cycle before the assertion of the next SYSDSTART*. The bus error signal is sampled when the system is waiting for the assertion of SYSDACK* and/or SYSAACK* of the operation corresponding to the current data phase. For example, if the address phase of a certain cycle has no response from the slave devices, the bus monitor logic will wait until the SYSDSTART* of the corresponding data phase before generating the bus error. The bus monitor logic can generate the bus error any time before the next data phase begins. 8.5.6.1 Bus Error Exceptions
As mentioned before, two operations can be pipelined on the CPU bus, and these two operations can be initiated from either the CPU as master or the DMAC as master. If the bus error occurs in the CPU initiated operation, the following occurs: * * * a bus error exception due to instruction fetch or data access is generated the bus error instruction or data address is recorded in the BadPAddr Register of COP0 the Status.BEM bit is set (This bit is the bus error mask (BEM) in the COP0 Status Register).
Once a bus error occurs, any further bus errors are ignored until Status.BEM is cleared by the bus error exception handler. If the bus error occurs in the DMA initiated operation (DMA cycle), the DMAC will finish the pending pipeline operations, disable itself, release the CPU Bus, and cause an interrupt. The interrupt routine will then service and re-enable the DMAC accordingly. Table 8-4 summarizes the exception generation:
Table 8-4. Bus Error Exceptions Operation with the Bus Error
CPU Initiated Instruction Fetch CPU Initiated Data Access DMA Cycle
Exception Generated
Bus Error Exception - Instruction Fetch Bus Error Exception - Data Access Interrupt Exception
8-25
Chapter 8 CPU Bus 8.5.6.2 CPU Bus Cycle Termination
Two pipeline operations can be in progress at any time, but if a bus error occurs, only the operation with the bus error is terminated. That is, the occurrence of a bus error with one master does not affect the program execution of another master. For example, if bus error occurs when the first and second operations are initiated from the DMAC and CPU, respectively, the CPU Bus will terminate the DMA operation and continue with the CPU operation. Table 8-5 summarizes CPU Bus cycle sequence for all types of CPU Bus cycle termination.
Table 8-5. Operation Termination Sequence First Operation with Bus Error
CPU Cycle #1
Second Operation
CPU Cycle #2
CPU Bus Cycle Sequence
1. CPU Cycle #1 is terminated. 2. Bus Error Exception occurs. 3. CPU Cycle #2 continues on. 1. CPU Cycle #1 is terminated. 2. Bus Error Exception occurs. 3. DMA Cycle #2 continues on. 1. DMA Cycle #1 is terminated. 2. CPU Cycle #2 continues on. 3. DMA releases CPU Bus, disable itself (disable further requests until the interrupt routine re-enable the DMAC), and generate an interrupt. 4. CPU cycles continues on. 1. DMA Cycle #1 is terminated. 2. DMA Cycle #2 continues on. 3. DMAC releases CPU Bus, disable itself (disable further requests until the interrupt routine re-enable the DMAC), and generate an interrupt. 4. CPU cycles continue on.
CPU Cycle #1
DMA Cycle #2
DMA Cycle #1
CPU Cycle #2
DMA Cycle #1
DMA Cycle #2
8.5.6.3
Bus Error Timing with No Pending Operation
If there are no pending operations on the bus, BUSERR* is ignored at all times. 8.5.6.4 Bus Error Timing with One Pending Operation
If there is one pending operation on the bus, BUSERR* is sampled while waiting for the assertion of SYSAACK* or SYSDACK*. If BUSERR* is asserted, the bus cycle will continue as if the SYSAACK* and/or the last SYSDACK* has been asserted. Figure 8-18, Figure 8-19, and Figure 8-20 illustrates the bus error associated with one pending operation. In these figures, BUSERR* is ignored before CPUDSTART* and after BUSERR* asserted because the bus is not waiting for the assertion of SYSAACK* nor SYSDACK*.
8-26
Chapter 8 CPU Bus
BUSCLK CPUADDR CPUWR* CPUTSIZE CPUASTART* SYSAACK* CPUDATA CPUDSTART* SYSDACK* BUSERR* Ignored Ignored D0 D1 D2 3 Addr
Bus Error Detection
Figure 8-18. One Operation with BUSERR* as the Last SYSDACK*
BUSCLK CPUADDR CPUWR* CPUTSIZE CPUASTART* SYSAACK* CPUDATA CPUDSTART* SYSDACK* BUSERR* Ignored Bus Error Detection Ignored D0 D1 D2 D3 3 Addr
Figure 8-19. One Operation with BUSERR* as SYSAACK*
8-27
Chapter 8 CPU Bus
BUSCLK CPUADDR CPUWR* CPUTSIZE CPUASTART* SYSAACK* CPUDATA CPUDSTART* SYSDACK* BUSERR* Ignored Bus Error Detection Ignored D0 D1 D2 3 Addr
Figure 8-20. One Operation with BUSERR* as SYSAACK* and the Last SYSDACK*
8.5.6.5
Bus Error Timing with Two Pending Operations
If there are two pending operations on the bus, BUSERR* is sampled while waiting for the assertion of SYSDACK*. If BUSERR* is asserted, the bus cycle will continue as if the last SYSDACK* has been asserted. The bus cycle will then proceed with the data phase of the next operation. The bus error that occurred is for the first pending operation. Figure 8-21 illustrates the bus error associated with two pending operations. In this figure, BUSERR* is ignored after BUSERR* asserted because the bus is no longer waiting for the assertion of SYSDACK* corresponding to operation AddrA with the bus error, and detection of bus error for operation AddrB has not started until the assertion of CPUDSTART*.
8-28
Chapter 8 CPU Bus
BUSCLK CPUADDR CPUWR* CPUTSIZE CPUASTART* SYSAACK* CPUDATA CPUDSTART* SYSDACK* BUSERR* Ignored Bus Error Detection Ignored Bus Error Detection for B A0 A1 A2 B0 3 3 AddrA AddrB
Figure 8-21. Two Operations with Bus Error as the Last SYSDACK*
8-29
Chapter 8 CPU Bus
8-30
Chapter 9 Performance Counter
9. Performance Counter
The performance counter provides the means for gathering statistical information about the internal events of the CPU and the pipeline during program execution. The statistics gathered during program execution aid in tuning the performance of hardware and software systems based on the processor.
9-1
Chapter 9 Performance Counter
9.1 Overview
The performance counter consists of one control register and two counters. The control register controls the functions of the monitor while the counters count the number of events specified by the control register.
9.2 Performance Counters and Performance Control Registers
The Performance Counter Control Register, or PCCR, and Performance Counter Registers PCR0 and PCR1 are mapped into COP0 Register 25. Both the register and counters are read/write registers accessible by MTPC, MTPS, MTC0, MFPC, MFPS and MFC0 instructions. Each counter is capable of counting one event as specified by the control register. The format of the PCCR is shown in Figure 9-1, and the format of PCR0 and PCR1 is shown in Figure 9-2.
31 30 29 28 27 26 25 24 23 22 21 20 19
15 14 13 12 11 10
9
5
4
3
2
1
0
C00000000000 T E
1 1 1 1 1 1 1 1 1 1 1 1
EVENT1
USKE0 111X L 1
1 1 1 1 1
EVENT0
USKE0 000X L 0
1 1 1 1 1
5
5
Figure 9-1. Format of the Performance Counter Control Register PCCR
31
30
0
OVFL
1
VALUE
31
Figure 9-2. Format of Performance Counter Registers PCR0 and PCR1
The interpretation of the PCCR register bits is as follows:
Table 9-1. PCCR Register Bits Field
CTE EVENT0/1 U0/1 S0/1 K0/1 EXL0/1
Function
If 1, PCR0 and PCR1 counting and exception generation is enabled. Event counted by PCR0/1; see Table 9-5 for details. PCR0/1 counts event EVENT0/1 when in User mode. PCR0/1 counts event EVENT0/1 when in Supervisor mode. PCR0/1 counts event EVENT0/1 when in non-exception Kernel mode; i.e. with both STATUS.EXL and STATUS.ERL set to 0. PCR0/1 counts event EVENT0/1 when in Level 1 exception handler.
Initial Value
0 Undefined Undefined Undefined Undefined Undefined
9-2
Chapter 9 Performance Counter
9.2.1
Accessing Counters and Registers
The counter control register PCCR and the two performance counter registers PCR0 and PCR1 are accessed by using MTC0* and MFC0* instructions. All three registers are mapped to COP0 register 25. Table 9-2 illustrates how these registers are written by using the MTC0 instruction, and Table 9-3 illustrates the encoding of the MFC0 instructions used to read the registers. Table 9-4 show special mnemonics to access the performance Counters and Registers.
Table 9-2. Writing Performance Counters and Registers using MTC0 OpCode[15:11]
11001 11001 11001 11001
OpCode[1:0]
00 01 10 11
Operation
Move to Counter Control Register Move to Performance Counter Register 0 unused Move to Performance Counter Register 1
Table 9-3. Reading Performance Counters and Registers using MFC0 OpCode[15:11]
11001 11001 11001 11001
OpCode[1:0]
00 01 10 11
Operation
Move from Counter Control Register Move from Performance Counter Register 0 unused Move from Performance Counter Register 1
Table 9-4. Mnemonics to Access the Performance Counters and Registers
MTPC MTPS MFPC MFPS Move to Performance Counter Move to Performance Event Specifies Move from Performance Counter Move from Performance Event Specifies
*
MTPC, MTPS, MFPC and MFPS are the special encoding of MTC0 and MFC0.
9-3
Chapter 9 Performance Counter
9.2.2 State of Performance Counter Control Registers Upon Reset
The CTE bit of the Performance Counter Control Register PCCR is initialized to 0 upon reset. This prevents event counting and interrupt generation until the control registers are initialized. It also allows a precise way for counters to be initialized by software; see the section 9.3.2 for more details. Note that the remaining bits of PCCR and both registers PCR0 and PCR1 must be initialized by software.
9-4
Chapter 9 Performance Counter
9.3 Counter Operation
The performance counters PCR0 and PCR1 increment by 1 whenever their corresponding count event occurs, and the counter is enabled. The count event for PCR0 is specified by PCCR.EVENT0 and the count event for PCR1 is specified by PCCR.EVENT1. The encoding of the EVENT field is specified in Table 9-5, and discussed in detail later. A counter is enabled only when both of the following conditions are satisfied: 1. The global counter enable flag PCCR.CTE is set to 1, and 2. The current privilege mode matches the permitted privilege mode for each counter. The values in PCCR.U0, PCCR.S0, PCCR.K0, and PCCR.EXL0 specify the permitted privilege modes for PCR0 and PCCR.U1. PCCR.S1, PCCR.K1, and PCCR.EXL1 specify the permitted privilege modes for PCR1. For example, if the current privilege mode is SUPERVISOR, PCR0 will operate only if PCCR.S0 is set to 1. Note that there is no "ERL0" or "ERL1" flag in PCCR. This is because counters are unconditionally disabled when in level 2 handlers.
9-5
Chapter 9 Performance Counter
9.3.1
Counter Events
A counter increments if it is enabled and its trigger event occurs. The permissible values for PCCR.EVENT0 and PCCR.EVENT1 are as shown in Table 9-5 below. The events are described in Section.9.3.1.1Event Descriptions
Table 9-5. Counter Events Event 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17-31 Counter 0 reserved Processor cycle Single instruction issue Branch issued BTAC miss ITLB miss I$ miss DTLB accessed Non-blocking load/store WBB single request WBB burst request CPU address bus busy Instruction completed Non-BDS instruction completed reserved Load completed No event reserved Counter 1 Low-order branch issued Processor cycle Dual instruction issue Branch mispredicted JTLB miss DTLB miss D$ miss WBB single request unavailable WBB burst request unavailable WBB burst request almost full WBB burst request full CPU data bus busy Instruction completed Non-BDS instruction completed COP1 instruction completed Store completed No event reserved
9-6
Chapter 9 Performance Counter
9.3.1.1
Event Descriptions
In event descriptions, the word `branch' (for example, `branch issued', or `branch misspredicted') means any `transfer of control' instruction that is subject to prediction (that is, all the conditional branch instructions, J, and JAL). The JR, JALR, ERET, SYSCALL, BREAK, and TRAP instructions are not included.
Branch issued
This event is triggered whenever a branch is issued to a functional pipe. Note that a branch that is issued in a pipelined implementation may get canceled if an instruction prior to it signals an exception. This event is triggered whenever the predicted branch address (taken or not-taken) is incorrect. Note that a branch that is issued in a pipelined implementation may get canceled if an instruction prior to it signals an exception. This event is triggered whenever the instruction address lookup into the BTAC fails. Counts low-order (even) branch instructions that miss the BTAC. Note that high-order (odd) branch does not refer the BTAC. This event is triggered when a COP1 instruction completes. The event is signaled even if the COP1 instruction completes successfully, but appears in the branch delay slot of a branchlikely instruction and is therefore nullified. Generates a signal once every BUSCLK (not CPU clock) that the CPU address bus is unavailable. The CPU address bus is considered unavailable whenever it is busy, or when two addresses have been issued but the data for the first address has yet to return. This event is triggered whenever a data cache miss is detected. See Table 9-6. for the D$ miss definition. Table 9-6. Definition of Data Cache Miss Access Load 1 0 Store 1 0 Pref 1 DCE 0 Page Attr. Uncached, UCA, Cached Uncached, UCA Cached Uncached, UCA, Cached Uncached, UCA Cached Uncached, UCA, Cached Uncached, UCA Cached Hit/Miss Miss Miss Hit/Miss Hit Hit Hit/Miss Uncount * Uncount * Hit/Miss
Branch mispredicted
BTAC miss
COP1 instruction completed CPU address bus busy
Data cache miss
In this event, the data cache miss is defined as any load/store/pref instructions which may generate bus read operations to get missed data from external memory.
* Prefetch to the Uncached or UCA page is considered as nop.
9-7
Chapter 9 Performance Counter
DTLB accessed
Barring canceled instructions, this event counts the total number of executed loads and stores. Thus, `data cache miss' divided by `DTLB accessed' provide a good estimate of the D miss rate (assuming no uncached loads/stores occur). Also, `DTLB miss' divided by `DTLB accessed' provides the DTLB miss rate. DTLB is accessed even when unmapped page is accessed in case that minor revision number is 0x10 or later. This event is triggered whenever a DTLB miss is detected. DTLB is accessed even when unmapped page is accessed in case that minor revision number is 0x10 or later. This event is signaled whenever both functional pipes of the C790 are issued instructions*. The event counter is incremented by 1. This event is triggered whenever an instruction cache miss is detected. This event triggers when an instruction completes. Note that some instructions (e.g. SYSCALL, TEQ, TEQI, etc.) signal exceptions as a normal part of their operation. Such instructions are considered complete whether or not the "normal" exception was raised. Therefore, an "instruction complete" event is signaled even if a TEQ succeeds (i.e. raises a Trap exception). However, if a "true" exception occurs (e.g. a counter exception is signaled while the TEQ is executing), the instruction is canceled and no "instruction complete" signal is generated. Similarly, an instruction in the branch delay slot (BDS) of a branch-likely instruction is counted as complete even if the BDS instruction is nullified. If the BDS instruction is canceled because of a "true" exception, no "instruction completed" event is signaled. C790 Implementation Note: Up to two instructions can complete every cycle in the C790. When two instructions do complete, the event counter is incremented by 2.
DTLB Miss
Dual instruction issued Instruction cache miss Instruction completed
ITLB miss JTLB miss Load completed
This event is triggered whenever a ITLB miss is detected. This event is triggered whenever a JTLB miss is detected. This event triggers when a load instruction completes. Note that the event is signaled even if the load appears in the branch delay slot of a branch-likely instruction that is not taken and is therefore nullified. Counts the numbers of branches that were issued that appeared in the low-order (even) position of an instruction pair fetch. This count is needed since only these branches are subject to BTAC lookup. This "event" effectively disables the corresponding counter. It is useful principally if only one of the two counters need be activated. This event triggers when an instruction that does not have a branch delay slot completes. In particular, it does not trigger when a branch or jump instruction completes. However, it does trigger when the instruction in the branch delay slot of the branch or jump completes. In the case of a branch-likely instruction, the instruction in the branch delay slot triggers the event even if this instruction is nullified. Note: this event is useful for stepping over instructions.
Low-order branch issued
No event Non-BDS instruction completed (for stepping)
* (Dual instruction issued) *2 + (Single instruction issued) = instruction issued (Instruction issued) - (instruction completed) = instruction canceled
9-8
Chapter 9 Performance Counter
Non-blocking load/store (1st cache miss): Processor cycle Single instruction issued Store completed
This event is signaled whenever a cached load/store/pref instruction misses on the Data Cache and there is no pending data cache miss, UCAB miss and uncached load. This event triggers on every processor clock cycle. This event is signaled whenever only one of the functional pipes of the C790 is issued an instruction*. This event triggers when a store instruction completes. Note that the event is signaled even if the store appears in the branch delay slot of a branch-likely instruction that is not taken and is therefore nullified. A non-burst request was made to the WBB. A burst request was made to the WBB. A non-burst request was made to the WBB, but there were insufficient free entries in the WBB to service it. All 8 entries are used at that time. A burst request was made to the WBB, but, the WBB was completely full, or there were not enough to service the request. 5, 6, 7, 8 entries are used at that time. A burst request was made to the WBB, and even though there were free entries, there were not enough to service the request. 5, 6, 7 entries are used at that time. A burst request was made to the WBB, but the WBB was completely full. All 8 entries are used at that time.
WBB Single Request WBB Burst Request WBB Single Request unavailable WBB Burst Request unavailable WBB Burst Request almost full WBB Burst Request full
* (Dual instruction issued) *2 + (Single instruction issued) = instruction issued (Instruction issued) - (instruction completed) = instruction canceled
9-9
Chapter 9 Performance Counter
9.3.2
Handling Performance Counter Exceptions
A performance counter exception is detected by an instruction if the following condition holds true: ~STATUS.ERL && PCCR.CTE && (CTR0.OVFL || CTR1.OVFL) Note that software should not rely on the exception occurring if the instruction is nullified; i.e. it appears in the branch delay slot of a branch likely instruction that is not taken.
C790 Implementation Note: C790 implementation always counts events that occur within nullified instructions.
The instruction detecting a counter exception is canceled by the exception, and instruction execution continues as follows: if ( in branch delay slot ) { ErrorEPC = PC - 4; CAUSE.BD2 = 1; } else { ErrorEPC = PC; CAUSE.BD2 = 0; } if ( STATUS.DEV ) PC = 0xBFC00280; // Uncached counter xcp handler else PC = 0x80000080; // "Normal" counter xcp handler STATUS.ERL = 1; CAUSE.EXC2 = 2; // Counter exception The description above makes use of the BD2 and EXC2 fields in the CAUSE register. Both are fields newly introduced in the C790 and occupy the bit positions shown below.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 S I II B B I EXC 00 PP000 D CE 0 0 0 0 0 0 0 0 0 EXC2 P 0 0 D O 7 32 2 P
Figure 9-3. CAUSE Register Fields
C790 Programming Note: Note that the "normal" exception entry point is in kseg0 space. That is, the address is unmapped and the caching policy is determined by CONFIG.K0. If you don't want to disturb the cache while counting and stepping, kseg0 should be configured in "uncached" mode. If cache data preservation is secondary to counter exception servicing performance counter overflow, kseg0 should be configured in "cached" mode.
9-10
Chapter 9 Performance Counter
9.3.3
Priority of Counter Exceptions
Counter exceptions have the highest priority after cold reset and NMI. If a cold reset occurs the processor is initialized - so a simultaneous counter exception is discarded. If an NMI occurs, the NMI handler is entered with either PCR0.OVFL or PCR1.OVFL (or both) set to 1, and ErrorEPC pointing at the instruction causing the counter overflow. (ErrorEPC is used because NMI is handled as a level 2 exception.) Once the NMI handler exits, the instruction that caused the overflow is re-executed. However, since PCR0.OVFL or PCR1.OVFL is 1, the instruction is canceled once more and the counter exception handler is entered.
9.3.4
Initializing Counters
Let us look at the code sequence needed to initialize counters and activate them. In the example below, PCR0 is set up to count clocks in all operating modes and report a counter exception after the count exceeds 231. CTR1 is set up to count stores while in supervisor mode only, and report a counter exception after the count exceeds 231. The code must be executed while in level 2 exception mode (ERL=1). STATUS.ERL = 1; // Set ERL (to inhibit counting) ErrorEPC = PCR0 = 0; PCCR.EVENT0 = 1; PCCR.U0 = 1; PCCR.S0 = 1; PCCR.K0 = 1; PCCR.EXL0 = 1; PCR1 = 0; PCCR.EVENT1 = 15; PCCR.U1 = 0; PCCR.S1 = 1; PCCR.K1 = 0; PCCR.EXL1 = 0; // Init CTR0, and ... // ... set up to count clocks ... // ... in all privilege modes
// Init PCRT1, and ... // ... set up to count completed stores ... // ... while in supervisor mode
PCCR.CTE = 1; // Enable global counter flag ERET // Execute ERET to clear ERL // counting begins with ERET's target // Note that the ERET instruction also // guarantees that the COP0 state // updated (e.g. CCR) is valid.
9-11
Chapter 9 Performance Counter
9.3.5
The Note to Read Counters
Whenever you want to read a counter by MTC0 or MTPC, be sure that any counting events must NOT occur, otherwise you may get wrong number. For example, counter for TLB event should be read in the unmapped area, that of instruction completion event should be read in the ERL=1 (level 2 exception) area or other disabled area. It is a implement-dependent that when the event is counted. It depends on the number of the pipeline stages and so on. To write a robust code among silicon versions and mask versions, you read the counters after flushing the pipeline by SYNC.P instruction. C790 is a pipeline processor. It is required for the instruction completion type event. It is a nature of event counting that some inaccuracy exists. You don't need to be surprised if different number is observed in different version of silicon/mask.
9-12
Chapter 10 Floating-Point Unit, CP1
10. Floating-Point Unit, CP1 (Option)
This chapter describes the floating-point operations, including the programming model, instruction set and formats. The floating-point operations fully conform to the requirements of ANSI/IEEE Standard 754-1985, IEEE Standard for Binary Floating-Point Arithmetic.
10-1
Chapter 10 Floating-Point Unit, CP1
10.1 Overview
All floating-point instructions, as defined in the MIPS ISA for the floating-point coprocessor, CP1, are processed by the other hardware unit that executes integer instructions. The floating point execution unit can be disabled by the coprocessor usability CU bit defined in the CP0 Status register.
10.2 Floating Point Register
10.2.1 Floating-Point General Registers (FGRs)
CP1 has a set of Floating-Point General Purpose registers (FGRs) that can be accessed in the following ways: * As 32 general purpose registers (32 FGRs), each of which is 32 bits wide when the FR bit in the CPU Status register equals 0; or as 32 general purpose registers (32 FGRs), each of which is 64-bits wide when FR equals 1. The CPU accesses these registers through move, load, and store instructions. As 16 floating-point registers (see the next section for a description of FPRs), each of which is 64-bits wide, when the FR bit in the CPU Status register equals 0. The FPRs hold values in either single- or double-precision floating-point format. Each FPR corresponds to adjacently numbered FGRs as shown in Figure 10-1. As 32 floating-point registers (see the next section for a description of FPRs), each of which is 64-bits wide, when the FR bit in the CPU Status register equals 1. The FPRs hold values in either single- or double-precision floating-point format. Each FPR corresponds to an FGR as shown in Figure 10-1.
*
*
10-2
Chapter 10 Floating-Point Unit, CP1
Floating-point Registers (FPR) (FR = 0) (least) (most) (least) (most)
Floating-Point General Purpose Registers 31 (FGR) FGR0 FGR1 FGR2 FGR3 * * * 0
Floating-point Registers (FPR) (FR = 1) 63 FPR0 FPR1 FPR2 FPR3
Floating-Point General Purpose Registers (FGR) FGR0 FGR1 FGR2 FGR3 * * * 0
FPR0 FPR2
FPR28 FPR30
(least) (most) (least) (most)
FGR28 FGR29 FGR30 FGR31
FPR28 FPR29 FPR30 FPR31 Floating-point Control Registers (FCR)
FGR28 FGR29 FGR30 FGR31
31
Control/Status Register (FCR31) 0
31
Implementation/Revision Register (FCR0)
0
Figure 10-1. FP Registers
10-3
Chapter 10 Floating-Point Unit, CP1
10.2.2 Floating-Point Registers (FPRs)
The FPU provides: * * 16 Floating-Point registers (FPRs) when the FR bit in the Status register equals 0, or 32 Floating-Point registers (FPRs) when the FR bit in the Status register equals 1.
These 64-bit registers hold floating-point values during floating-point operations and are physically formed from the General Purpose registers (FGRs). When the FR bit in the Status register equals 1, the FPR references a single 64-bit FGR. The FPRs hold values in either single- or double-precision floating-point format. If the FR bit equals 0, only even numbers (the least register) can be used to address FPRs. When the FR bit is set to a 1, all FPR register numbers are valid. If the FR bit equals 0 during a double-precision floating-point operation, the general registers are accessed in double pairs. Thus, in a double-precision operation, selecting Floating-Point Register 0 (FPR0) actually addresses adjacent Floating-Point General Purpose registers FGR0 and FGR1.
10.2.3 Floating-Point Control Registers
The MIPS RISC architecture defines 32 floating-point control registers (FCRs); the C790 processor implements two of these registers: FCR0 and FCR31. These FCRs are described below: * * * The Implementation/Revision register (FCR0) holds revision information. The Control/Status register (FCR31) controls and monitors exceptions, holds the result of compare operations, and establishes rounding modes.
FCR1 to FCR30 are reserved.
Table 10-1 lists the assignments of the FCRs.
Table 10-1. Floating-Point Control Register Assignments FCR Number
FCR0 FCR1 to FCR30 FCR31 Reserved Rounding mode, cause, trap enables, and flags
Use
Coprocessor implementation and revision register
10-4
Chapter 10 Floating-Point Unit, CP1 Implementation and Revision Register (FCR0) The read-only Implementation and Revision register (FCR0) specifies the implementation and revision number of CP1. This information can determine the coprocessor revision and performance level, and can also be used by diagnostic software. Figure 10-2 shows the layout of the register; Table 10-2 describes the Implementation and Revision register (FCR0) fields.
Implementation/Revision Register (FCR0)
31 0 16 16 15 Imp 8 87 Rev 8 0
Figure 10-2. Implementation/Revision Register
Table 10-2. FCR0 Fields Field
Imp Rev 0
Description
Implementation number Revision number in the form of y. x Reserved. Returns zeroes when read.
Initial value
0x38 Revision Number
The revision number is a value of the form y. x, where: * *
y is a major revision number held in bits 7:4. x is a minor revision number held in bits 3:0.
The revision number distinguishes some chip revisions; however, there is not guarantee that changes to its chips are necessarily reflected by the revision number, or that changes to the revision number necessarily reflect real chip changes. For this reason revision number values are not listed, and software should not rely on the revision number to characterize the chip. IEEE Standard 754 IEEE Standard 754 specifies that floating-point operations detect certain exceptional cases, raise flags, and can invoke an exception handler when an exception occurs. These features are implemented in the MIPS architecture with the Cause, Enable, and Flag fields of the Control/Status register. The Flag bits implement IEEE 754 exception status flags, and the Cause and Enable bits implement exception handling.
10-5
Chapter 10 Floating-Point Unit, CP1 (FCR31) Control/Status Register (FCR31) The Control/Status register (FCR31) contains control and status information that can be accessed by instructions in either Kernel or User mode. FCR31 also controls the arithmetic rounding mode and enables User mode traps, as well as identifying any exceptions that may have occurred in the most recently executed floating-point instruction, along with any exceptions that may have occurred without being trapped. Figure 10-3 shows the format of the Control/Status register, and Table 10-3 describes the Control/Status register fields. Figure 10-4 shows the Control/Status register Cause, Flag, and Enable fields.
Control/Status Register (FCR31)
31 0 7 25 24 23 22
FS C
18 17 0 5 Cause EVZOUI 6
12 11 Enables VZOUI 5
76 Flags VZOUI 5
21 RM 2
0
1
1
Figure 10-3. FP Control/Status Register Bit Assignments
Table 10-3. Control/Status Register Fields Field
FS C Cause Enables Flags RM
Description
When set, denormalized results can be flushed instead of causing an unimplemented operation exception. Condition bit. See description of Control/Status register Condition bit. Cause bits. See Figure 10-4 and the description of Control/Status register Cause, Flag, and Enable bits. Enable bits. See Figure 10-4 and the description of Control/Status register Cause, Flag, and Enable bits. Flag bits. See Figure 10-4 and the description of Control/Status register Cause, Flag, and Enable bits. Rounding mode bits. See Table 10-5 and the description of Control/Status register Rounding Mode Control bits.
10-6
Chapter 10 Floating-Point Unit, CP1
Bit# 17 E Bit# 16 V 11 V Bit# 6 V 15 Z 10 Z 5 Z 14 O 9 O 4 O 13 U 8 U 3 U 12 I 7 I 2 I
Cause Bits Enable Bits Flag Bits
Inexact Operation Underflow Overflow Division by Zero Invalid Operation Unimplemented Operation
Figure 10-4. Control/Status Register Cause, Flag, and Enable Fields
Control/Status Register FS Bit The FS bit enables the flushing of denormalized values. When the FS bit is set and the Underflow and Inexact Enable bits are not set, denormalized results are flushed instead of causing an Unimplemented Operation exception. Results are flushed to either 0 or the minimum normalized value, depending upon the rounding mode (see Table 10-4 below), and the Underflow and Inexact of the Cause and Flag bits are set.
Table 10-4. Flush Values of Denormalized Results
Denormalized Result Positive Negative Flushed Result Rounding Mode RN +0 -0 RZ +0 -0 RP +2Emin -0 RM +0 -2Emin
Control/Status Register Condition Bit When a floating-point Compare operation takes place, the result is stored at bit 23, the Condition bit. The C bit is set to 1 if the condition is true; the bit is cleared to 0 if the condition is false. Bit 23 is affected only by compare and CTC1 instructions.
10-7
Chapter 10 Floating-Point Unit, CP1 Control/Status Register Cause, Flag, and Enable Fields Figure 10-4 illustrates the Cause, Flag, and Enable fields of the Control/Status register. The Cause and Flag fields are updated by all conversion, computational (except MOV. fmt), CTC1, reserved, and unimplemented instructions. All other instructions have no affect on these fields. Cause Bits Bits 17:12 in the Control/Status register contain Cause bits, as shown in Figure 10-4, which reflect the results of the most recently executed floating-point instruction. The Cause bits are a logical extension of the CP0 Cause register; they identify the exceptions raised by the last floating-point operation. If the corresponding Enable bit is set at the time of the exception a floating-point exception is raised and trapped by CPU. If more than one exception occurs on a single instruction, each appropriate bit is set. The Cause bits are updated by most floating-point operations. The Unimplemented Operation (E) bit is set to 1 if software emulation is required, otherwise it remains 0. The other bits are set to 0 or 1 to indicate the occurrence or non-occurrence (respectively) of an IEEE 754 exception. Within the set of floating-point instructions that update the Cause bits, the Cause field indicates the exceptions raised by the most-recently-executed instruction. When a floating-point exception is taken, no results are stored, and the only state affected is the Cause bit. Enable Bits A floating-point exception is generated any time a Cause bit and the corresponding Enable bit are set. A floating-point operation that sets an enabled Cause bit forces an immediate floating-point exception, as does setting both Cause and Enable bits with CTC1. There is no enable for Unimplemented Operation (E). An Unimplemented exception always generates a floating-point exception. Before returning from a floating-point exception, software must first clear the enabled Cause bits with a CTC1 instruction to prevent a repeat of the exception trapping. Thus, User mode programs can never observe enabled Cause bits set; if this information is required in a User mode handler, it must be passed somewhere other than the Status register. For a floating-point operation that sets only unenabled Cause bits, no floating-point exception occurs and the default result defined by IEEE 754 is stored. In this case, the exceptions that were caused by the immediately previous floating-point operation can be determined by reading the Cause field.
10-8
Chapter 10 Floating-Point Unit, CP1 Flag Bits The Flag bits are cumulative and indicate the exceptions that were raised by the operations that were executed since the bits were explicitly reset. Flag bits are set to 1 if an IEEE 754 exception is raised, otherwise they remain unchanged. The Flag bits are never cleared as a side effect of floating-point operations; however, they can be set or cleared by writing a new value into the Status register, using a CTC1 instruction. When a floating-point exception is trapped, the flag bits are not set by the hardware; floating-point exception software is responsible for setting these bits before invoking a user handler. Control/Status Register Rounding Mode Control Bits Bits 1 and 0 in the Control/Status register constitute the Rounding Mode (RM) field. As shown in Table 10-5, these bits specify the rounding mode that CP1 uses for all floating-point operations.
Table 10-5. Rounding Mode Bit Decoding Rounding ModeRM (1:0)
0
Mnemonic
RN
Description
Round result to nearest representable value; round to value with least-significant bit 0 when the two nearest representable values are equally near. Round toward 0: round to value closest to and not greater in magnitude than the infinitely precise result. Round toward +: round to value closest to and not less than the infinitely precise result. Round toward -: round to value closest to and not greater than the infinitely precise result.
1
RZ
2 3
RP RM
10.2.4 Accessing the FP Control and Implementation/Revision Registers
The Control/Status and the Implementation/Revision registers are read by a Move Control From Coprocessor 1 (CFC1) instruction. The bits in the Control/Status register can be set or cleared by writing to the register using a Move Control To Coprocessor 1 (CTC1) instruction. The Implementation/Revision register is a read-only register. There are no pipeline hazards (between any instructions) associated with floating-point control registers.
10-9
Chapter 10 Floating-Point Unit, CP1
10.3 Floating-Point Formats
CP1 performs both 32-bit (single-precision) and 64-bit (double-precision) IEEE standard floating-point operations. The 32-bit single-precision format has a 24-bit signedmagnitude fraction field (f+s) and an 8-bit exponent (e), as shown in Figure 10-5.
31 s Sign 1 30 e Exponent 8 23 22 f Fraction 23 0
Figure 10-5. Single-Precision Floating-Point Format
The 64-bit double-precision format has a 53-bit signed-magnitude fraction field (f+s) and an 11-bit exponent, as shown in Figure 10-6.
63 s Sign 1 62 e Exponent 11 5251 f Fraction 52 0
Figure 10-6. Double-Precision Floating-Point Format
As shown in the above figures, numbers in floating-point format are composed of three fields: * * * sign field, s biased exponent, e = E + bias fraction, f = b1b2....bp-1 where bias = 127, p = 24 in single precision,
bias = 1023, p = 53 in double precision
The range of the unbiased exponent E includes every integer between the two values Emin and Emax inclusive, together with two other reserved values: * * Emin - 1 (to encode 0 and denormalized numbers) Emax + 1 (to encode and NaNs [Not a Number])
For single-and double-precision formats, each representable nonzero numerical value has just one encoding uniquely. For single-and double-precision formats, the value of a number, v, is determined by the equations shown in Table 10-6.
10-10
Chapter 10 Floating-Point Unit, CP1
Table 10-6. Equations for Calculating Values in Single and Double-Precision Floating-Point Format Equation
v = NaN v = (-1)s v = (-1)s2E(1.f) v = (-1) 2 v = (-1)s0
s Emin
Condition
E = Emax+1 and f 0, regardless of s E = Emax+1 and f = 0 Emin E Emax E = Emin-1 and f 0 E = Emin-1 and f = 0
(0.f)
For all floating-point formats, if v is NaN, the most-significant bit of f determines whether the value is a signaling or quiet NaN: v is a signaling NaN if the most-significant bit of f is set, otherwise, v is a quiet NaN. Table 10-7 defines the values for the format parameters; minimum and maximum floating-point values are given in Table 10-8.
Table 10-7. Floating-Point Format Parameter Values
Parameter Emax Emin Exponent bias Exponent width in bits Integer bit Fraction width in bits Format width in bits Format Single +127 -126 +127 8 hidden 23 32 Double +1023 -1022 +1023 11 hidden 52 64
Excluding the sign bit. Table 10-8. Minimum and Maximum Floating-Point Values Type
Float Minimum Float Minimum Norm Float Maximum Double Minimum Double Minimum Norm Double Maximum 1.40129846e
Value
-45
1.17549435e-38 3.40282347e+38 4.9406564584124654e-324 2.2250738585072014e-308 1.7976931348623157e+308
10-11
Chapter 10 Floating-Point Unit, CP1
10.4 Binary Fixed-Point Format
Binary fixed-point values are held in 2's complement format. Unsigned fixed-point values are not directly provided by the floating-point instruction set. Figure 10-7 illustrates binary word fixed-point format and Figure 10-8 illustrates binary long fixed-point format; Table 10-9 lists the binary fixed-point format fields.
31 Sign 1 30 Integer 31 0
Figure 10-7. Binary Word Fixed-Point Format
63 Sign 1 62 Integer 63 0
Figure 10-8. Binary Long Fixed-Point Format
Field assignments of the binary fixed-point format are:
Table 10-9. Binary Fixed-Point Format Fields Field
sign integer sign bit integer value (2's complement)
Description
10-12
Chapter 10 Floating-Point Unit, CP1
10.5 Floating-Point Instruction Set Summary
Each instruction is 32 bits long, and aligned on a word boundary. This section describes the overview of instructions for floating-point unit. A detailed description of each instruction is provided in Appendix D.
10.5.1 Load, Store and Move Instructions (Table 10-10)
Load and Store instructions move data between memory and FPU general purpose registers(FGR), and Move instructions move data directly between CPU and FPU general purpose registers(FGR). These instructions are not perform format conversions and therefore never cause floating-point exceptions. The instruction immediately following a load can use the contents of the loaded register. However, in such case the hardware interlocks, requiring additional real cycles. Thus, the scheduling of load delay slots is required to avoid the interlocking.
Table 10-10. FPU Instruction Set (Optional): Load, Move and Store Instruction Instruction
LWC1 SWC1 MTC1 MFC1 CTC1 CFC1 LDC1 SDC1 DMTC1 DMFC1
Description
Load Word to FPU (coprocessor 1) Store Word from FPU (coprocessor 1) Move Word to FPU (coprocessor 1) Move Word from FPU (coprocessor 1) Move Control Word to FPU (coprocessor 1) Move Control Word from FPU (coprocessor 1) Load Doubleword to FPU (coprocessor1) Store Doubleword from FPU (coprocessor1) Move Doubleword to FPU (coprocessor1) Move Doubleword from FPU (coprocessor1)
Note
MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS II MIPS II MIPS III MIPS III
10-13
Chapter 10 Floating-Point Unit, CP1
10.5.2 Conversion Instructions (Table 10-11)
Conversion instructions perform conversion operations between the various data formats.
Table 10-11. FPU Instruction Set(Optional): Conversion Instruction Instruction
CVT.S.fmt CVT.W.fmt CVT.D.fmt ROUND.W.fmt TRUNC.W.fmt CEIL.W.fmt FLOOR.W.fmt CVT.L.fmt ROUND.L.fmt TRUNC.L.fmt CEIL.L.fmt FLOOR.L.fmt
Description
Floating-Point Convert to Single FP Format Floating-Point Convert to Word Fixed-Point Format Floating-Point Convert to Double FP Format Floating-point Round to Word Fixed-Point Floating-point Truncate to Word Fixed-Point Floating-point Ceiling Convert to Word Fixed-Point Floating-point Floor Convert to Word Fixed-Point Floating-Point Convert to Long Fixed-Point Format Floating-point Round to Long Fixed-Point Floating-point Truncate to Long Fixed-Point Floating-point Ceiling Convert to Long Fixed-Point Floating-point Floor Convert to Long Fixed-Point
Note
MIPS I MIPS I MIPS I MIPS II MIPS II MIPS II MIPS II MIPS III MIPS III MIPS III MIPS III MIPS III
10.5.3 Computational Instructions (Table 10-12)
Computational instructions perform arithmetic operations on floating-point values in the FPU registers. These are two categories of computational instructions: * * 3-Operand Register-Type instructions, which perform subtraction multiplication, and division operations floating-point addition,
2-Operand Register-Type instructions, which perform floating-point abusolute value, move, negate, and square root operations.
Table 10-12. FPU Instruction Set(Optional): Computational Instruction Instruction
ADD.fmt SUB.fmt MUL.fmt DIV.fmt ABS.fmt MOV.fmt NEG.fmt SQRT.fmt Floating-point Add Floating-point Subtract Floating-point Multiply Floating-point Divide Floating-point Absolute Value Floating-point Move Floating-point Negate Floating-point Square root
Description
Note
MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS II
10-14
Chapter 10 Floating-Point Unit, CP1
10.5.4 Compare and Branch Instructions (Table 10-13)
Compare instructions perform comparisons of the contents of registers and set a conditional bit based on the results. Branch on FPU Condition instructions perform a branch to the specified target if the specified coprocessor condition is met.
Table 10-13. FPU Instruction Set(Optional): Compare and Branch Instruction Instruction
C.cond.fmt BC1T BC1F Floating-point Compare Branch on FPU True Branch on FPU False
Description
Note
MIPS I MIPS I MIPS I
10-15
Chapter 10 Floating-Point Unit, CP1
10-16
Chapter 11 Floating-Point Exception
11.
Floating-Point Exception (Option)
This chapter describes FPU floating-point exceptions, including FPU exception types, exception trap processing, exception flags, saving and restoring state when handling an exception, and trap handlers for IEEE Standard 754 exceptions. A floating-point exception occurs whenever the FPU cannot handle either the operands or the results of a floating-point operation in its normal way. The FPU responds by generating an exception to initiate a software trap or by setting a status flag.
11-1
Chapter 11 Floating-Point Exception
11.1 Introduction
This chapter describes floating-point exceptions, including FPU exception type, exception trap processing, exception flags, saving and restoring state when handling an exception, and trap handlers for IEEE Standard 754 exceptions.
11.2 Exception Types
The FP Control/Status register described in Chapter 10 contains an Enable bit for each exception type; exception Enable bits determine whether an exception will cause the FPU to initiate a trap or set a status flag. * * If a trap is taken, the FPU remains in the state found at the beginning of the operation and a software exception handling routine executes. If no trap is taken, an appropriate value is written into the FPU destination register and execution continues.
The FPU supports the five IEEE Standard 754 exceptions: * * * * * Inexact (I) Underflow (U) Overflow (O) Division by Zero (Z) Invalid Operation (V)
Cause bits, Enables, and Flag bits (status flags) are used. The FPU adds a sixth exception type, Unimplemented Operation (E). This exception indicates the use of a software implementation. The Unimplemented Operation exception has no Enable or Flag bit; whenever this exception occurs, an unimplemented exception trap is taken. Figure 11-1 shows the Control/Status register bits that support exceptions.
Bit # 17 E Bit # 16 V | 11 V Bit # |
Unimplemented
15 Z | 10 Z | 5 Z | Division by Zero
14 O | 9 O | 4 O | Overflow
13 U | 8 U | 3 U | Underflow
12 I | 7 I | 2 I | Inexact Flag Bits Enable Bits Cause Bits
| 6 V | Invalid
Figure 11-1. Control/Status Register Exception/Flag/Trap/Enable Bits
11-2
Chapter 11 Floating-Point Exception
11.3 Exception Trap Processing
When a floating-point exception trap is taken, the Cause register indicates the floatingpoint coprocessor is the cause of the exception trap. The Floating-Point Exception (FPE) code is used, and the Cause bits of the floating-point Control/Status register indicate the reason for the floating-point exception. These bits are, in effect, an extension of the system coprocessor Cause register.
11.4 Flags
A Flag bit is provided for each IEEE exception. This Flag bit is set to a 1 on the assertion of its corresponding exception, without corresponding exception trap signaled. The Flag bit is reset by writing a new value into the Status register; flags can be saved and restored by software either individually or as a group. When no exception trap is signaled, floating-point coprocessor takes a default action, providing a substitute value for the exception-causing result of the floating-point operation. The particular default action taken depends upon the type of exception. Table 11-1 lists the default action taken by the FPU for each of the IEEE exceptions.
Table 11-1.Default FPU Exception Actions Field
I
Description
Inexact exception
Rounding Mode
Any RN RZ Supply a rounded result
Default action
Modify underflow values to 0 with the sign of the intermediate result Modify underflow values to 0 with the sign of the intermediate result Modify positive underflows to the format's smallest positive finite number; modify negative underflows to -0. Modify negative underflows to the format's smallest negative finite number; modify positive underflows to 0. Modify overflow values to with the sign of the intermediate result Modify overflow values to the format's largest finite number with the sign of the intermediate result Modify negative overflows to the format's most negative finite number; modify positive overflows to + Modify positive overflows to the format's largest finite number; modify negative overflows to - Supply a properly signed Supply 231 -1 result (Word Fixed-Point); Supply 267 -1 result (Long Fixed-Point); Otherwise supply a quiet Not a Number
U
Underflow exception
RP RM RN RZ
O
Overflow exception
RP RM
Z V
Division by zero Invalid operation
Any Any
11-3
Chapter 11 Floating-Point Exception The FPU detects the eight exception causes internally. When the FPU encounters one of these unusual situations, it causes either an IEEE exception or an Unimplemented Operation exception (E). Table 11-2 lists the exception-causing situations and contrasts the behavior of the FPU with the requirements of the IEEE Standard 754.
Table 11-2.FPU Exception-Causing Conditions FPA Internal Result
Inexact result Exponent overflow Division by zero Overflow on convert to Integer Signaling NaN source Invalid operation Exponent underflow Denormalized or QNaN
IEEE Standard 754
I O, I (*1) Z V V V U None
Trap Enable
I O, I Z V (*2) V V E E
Trap Disable
I O, I Z V (*2) V V UI (*3) E 0/0, etc. Loss of accuracy
Notes
Normalized exponent > Emax Zero is (exponent=Emin -1, mantissa=0) Source out of integer range, , NaN
Normalized exponent < Emin Denormalized is (exponent=Emin -1 and mantissa <> 0)
(*1) The IEEE Standard 754 specifies an inexact exception on overflow only if the overflow trap is disabled. (*2) Some implementations such as TX49 trap as (E) and SW support is requred. implementation there is NO SW support required. In TX79
(*3) Exponent underflow sets the U and I Cause bits if both the U and I Enable bits are not set and the FS bit is set; otherwise exponent underflow sets the E Cause bit.
11-4
Chapter 11 Floating-Point Exception
11.5 FPU Exceptions
The following sections describe the conditions that cause the FPU to generate each of its exceptions, and details the FPU response to each exception-causing condition. Inexact Exception (I) The FPU generates the Inexact exception if one of the following occurs: * * * the rounded result of an operation is not exact, or the rounded result of an operation overflows, or the rounded result of an operation underflows and both the Underflow and Inexact Enable bits are not set and the FS bit is set. If Inexact exception traps are enabled, the result register is not modified and the source registers are preserved.
Trap Enabled Results:
Trap Disabled Results: The rounded or overflowed result is delivered to the destination register if no other software trap occurs.
11-5
Chapter 11 Floating-Point Exception Invalid Operation Exception (V) Floating-Point format operation The Invalid Operation exception is signaled if one or both of the operands are invalid for an implemented operation. When the exception occurs without a trap, the MIPS ISA defines the result as a quiet Not a Number (QNaN) for Floating-Point format. The invalid operations are: * Addition or subtraction: magnitude subtraction of infinities, such as: ( + ) + (-) or (-) - (-) * Multiplication: 0 times , with any signs * Division: 0/0, or /, with any signs * Comparison of predicates involving `<' or `>' without `?', when the operands are unordered * Any arithmetic operation, when one or both operands is a signaling NaN. A move (MOV) operation is not considered to be an arithmetic operation, but absolute value (ABS) and negate (NEG) are considered to be arithmetic operations. * Comparison or Convertion From Floating-point Format on a signaling NaN. * Square root:
x , where x is less than zero.
Software can simulate the Invalid Operation exception for other operations that are invalid for the given source operands. Examples of these operations include IEEE Standard 754-specified functions implemented in software, such as Remainder: x REM y, where y is 0 or x is infinite; conversion of a floating-point number to a decimal format whose value causes an overflow, is infinity, or is NaN; and transcendental functions, such as ln (-5) or cos-1 (3). Refer to Appendix D for examples or for routines to handle these cases. Trap Enabled Results: The result register is not modified, and the source registers are preserved. Trap Disabled Results: A quiet NaN is delivered to the destination register if no other software trap occurs. Conversion to Integer format The Invalid Operation exception is also raised when the source operand is an Infinity () or NaN, or the correctly rounded integer result is outside of the representable range. Trap Enabled Results: The result register is not modified, and the source registers are preserved. Trap Disable Results: The result value 231 -1 (for Word Fixed-Point) or 263 -1 (for Long Fixed-Point) is delivered to the destination register if no other software trap occurs.
`<', `>' and `?' are the notation in IEEE std 754. `?' means `unordered.' See Compare instruction in Appendix D.
11-6
Chapter 11 Floating-Point Exception Division-by-Zero Division-by-Zero Exception (Z) The Division-by-Zero exception is signaled on an implemented divide operation if the divisor is zero and the dividend is a finite nonzero number. Software can simulate this exception for other operations that produce a signed infinity, such as In (0), sec (/2), csc (0), or 0-1 Trap Enabled Results: The result register is not modified, and the source registers are preserved.
Trap Disabled Results: The result, when no trap occurs, is a correctly signed infinity. Overflow Exception (O) The Overflow exception is signaled when the magnitude of the rounded floating-point result, with an unbounded exponent range, is larger than the largest finite number of the destination format. (This exception also signals an Inexact exception.) Trap Enabled Results: The result register is not modified, and the source registers are preserved.
Trap Disabled Results: The result, when no trap occurs, is determined by the rounding mode and the sign of the intermediate result (see Table 11-3).
Table 11-3.Values of Overflow Results Denormalized Result
Positive Negative
Flushed result Rounding Mode
RN + - RZ +Emax -Emax RP + -Emax RM +Emax -
Underflow Exception (U) Two related events contribute to the Underflow exception: * * creation of a tiny nonzero result between 2Emin which can cause some later exception because it is so tiny extraordinary loss of accuracy during the approximation of such tiny numbers by denormalized numbers.
IEEE Standard 754 allows a variety of ways to detect these events, but requires they be detected the same way for all operations. Tininess can be detected by one of the following methods: * * after rounding (when a nonzero result, computed as though the exponent range were unbounded, would lie strictly between 2Emin) before rounding (when a nonzero result, computed as though the exponent range and the precision were unbounded, would lie strictly between 2Emin).
The MIPS architecture requires that tininess be detected after rounding. Loss of accuracy can be detected by one of the following methods:
11-7
Chapter 11 Floating-Point Exception * * denormalization loss (when the delivered result differs from what would have been computed if the exponent range were unbounded) inexact result (when the delivered result differs from what would have been computed if the exponent range and precision were both unbounded).
The MIPS architecture requires that loss of accuracy be detected as an inexact result. Trap Enabled Results: If Underflow or Inexact traps are enabled, or if the FS bit is not set, then an Unimplemented exception (E) is generated, and the result register is not modified and the source registers are preserved.
Trap Disabled Results: If Underflow and Inexact traps are not enabled and the FS bit is set, the result is determined by the rounding mode and the sign of the intermediate result (See Table 10-4). Unimplemented Instruction Exception (E) Any attempt to execute an instruction with an operation code or format code that has been reserved for future definition sets the Unimplemented bit in the Cause field in the FPU Control/Status register and traps. The operand and destination registers remain undisturbed and the instruction is emulated in software. Any of the IEEE Standard 754 exceptions can arise from the emulated operation, and these exceptions are simulated. The Unimplemented Instruction exception can also be signaled when unusual operands or result conditions are detected that the implemented hardware cannot handle properly. These include: * * * * * * Denormalized operand, except for Compare instruction Quiet Not a Number operand, except for Compare instruction Denormalized result or Underflow, when either Underflow or Inexact Enable bit is set or the FS bit is not set. Reserved opcodes Unimplemented formats Operations which are invalid for their format (for instance, CVT.S.S)
NOTE: Denormalized and NaN operands are only trapped if the instruction is a convert or a computational operation. A move opration does not trap if their operands are either denormalized or NaNs.
The use of this exception newly developed and are Loopholes are provided in with assistance provided Standard 754. Trap Enabled Results:
for such conditions is optional; most of these conditions are not expected to be widely used in early implementations. the architecture so that these conditions can be implemented by software, maintaining full compatibility with the IEEE
The result register is not modified, and the source registers are preserved.
Trap Disabled Results: This trap cannot be disabled. 11-8
Chapter 11 Floating-Point Exception
11.6 Saving and Restoring State
Sixteen doubleword coprocessor load or store operations save or restore the coprocessor floating-point register state in memory. The remainder of control and status information can be saved or restored through CFC1/CTC1 instructions, and saving and restoring the processor registers. Normally, the Control/Status register is saved first and restored last. When state is restored, state information in the Control/Status register indicates the exceptions that are pending. Writing a zero value to the Cause field of Control/Status register clears all pending exceptions, permitting normal processing to restart after the floating-point register state is restored.
11.7 Trap Handlers for IEEE Standard 754 Exceptions
The IEEE Standard 754 strongly recommends that users be allowed to specify a trap handler for any of the five standard exceptions so that a software subroutine can return a value to be used in stead of the exceptional operation's result; the trap handler can either compute or specify a substitute result to be placed in the destination register of the operation. By retrieving an instruction using the processor Exception Program Counter (EPC) register, the trap handler determines: * * * exceptions occurred during the operation the operation being performed the destination format
On Overflow or Underflow exceptions (except for conversions), and on Inexact exceptions, the trap handler gains access to the correctly rounded result by decoding source register field of the instruction code and simulating the operation in software. On Overflow or Underflow exceptions caused by a floating-point conversion, on Invalid Operation and on Division-by-Zero exceptions, the trap handler gains access to the operand values by decoding the source register field of the instruction code. The IEEE Standard 754 recommends that, if enabled, the overflow and underflow traps take precedence over a separate inexact trap. This prioritization is accomplished in software; hardware sets the bits for both the Inexact exception and the Overflow or Underflow exception.
32 doublewords if the FR bit is set to 1.
11-9
Chapter 11 Floating-Point Exception
11-10
Chapter 12 PC Trace
12.
This chapter describes the trace functions present on the C790.
PC Trace
The C790 supports real-time PC tracing. Pipeline status, target addresses of indirect jumps, and exception vectors are made available on special signals. The executed instruction sequence can be restored from signals and the source program. The C790 also supports hardware breakpoints. The breakpoint facility is described in Chapter 13.
12-1
Chapter 12 PC Trace
12.1 Real-Time PC Tracing
Trace information and non-sequential Program Counters are made available on special signal lines of the CPU. The following trace information is made available: * * * Instruction being executed in pipeline 0 Instruction being executed in pipeline 1 Current execution status (Normal (sequential), Branch Taken, Jump Target, Exception Target)
For Indirect jumps, the target address is also made available. For exception vectors, a code for the exception vector address is made available.
12.1.1 Classification of Branch and Jump Instructions
In this chapter, branches and jumps are classified into three categories which are direct jump, indirect jump and branch in order to explains the function of PC trace. The classification is show in Table 12-1.
Table 12-1. Classification of Branch and Jump Instruction
Class
Jump Direct Jump Indirect Jump Branch
Instruction
Direct or Indirect Jump J or JAL Instruction JR, JALR or ERET Instruction Any of conditional branch Instruction
12-2
Chapter 12 PC Trace
12.1.2 PC Trace Signals
All PC trace signals operate at half the C790 CPU clock frequency using the BUSCLK clock signal. Because of the half frequency operation there are pairs of signals which indicate the status of execution within the CPU pipelines. Phase A signals show the status corresponding to the even CPU clock cycle and Phase B signals show the status corresponding to the odd CPU clock cycle. As can be seen from the following figure the execution status of the CPU pipeline during time 0 (all time references are in relation to the CPU clock) is put on the phase A signals at the next rising edge of BUSCLK during time 2. Similarly the execution status of the CPU pipeline during time 1 is put on the phase B signals.
Time Phase CPUCLK 0 A 1 B 2 A 3 B 4 A 5 B 6 A 7 B 8 A 9 B 10 A
BUSCLK Phase A Signals Phase B Signals
0
2
4
6
1
3
5
7
The following signals are made available for real-time PC tracing. * P0EXEA* * P1EXEA* * JMPA* * P0EXEB* * P1EXEB* * JMPB* * TPCE* * TPC[3:0] (1) P0EXEA* (Phase A Pipeline 0 Execution Status) (Phase A Pipeline 1 Execution Status) (Phase A Jump) (Phase B Pipeline 0 Execution Status) (Phase B Pipeline 1 Execution Status) (Phase B Jump) (Target PC Enable) (Target PC Bus) Output Output Output Output Output Output Output Output
(Phase A Pipeline 0 Execution Status)
Output
P0EXEA indicates whether an instruction has completed execution without generating an exception (retired) via Pipeline 0 during phase A. 0: An instruction was retired. 1: No instruction was retired.
12-3
Chapter 12 PC Trace (2) P1EXEA* (Phase A Pipeline 1 Execution Status) Output
P1EXEA indicates whether an instruction retired via Pipeline 1 during phase A. Note if this signal is asserted at the same time as P0EXEA* then two instructions were retired simultaneously during phase A via pipelines 0 and 1 but there is no indication as to which specific instruction was retired via which pipeline. 0: An instruction was retired. 1: No instruction was retired. (3) JMPA* (Jump Phase A) Output
A jump was retired during phase A or a conditional branch instruction was retired and the branch was taken during phase A. Note that exceptions do not assert this signal. 0: Jump or conditional branch instruction was retired. 1: No Jump or conditional branch instruction was retired. (4) P0EXEB* (Phase B Pipeline 0 Execution Status) Output
P0EXEB indicates whether an instruction retired via Pipeline 0 during phase B. 0: An instruction was retired. 1: No instruction was retired. (5) P1EXEB* (Phase B Pipeline 1 Execution Status) Output
P1EXEB indicates whether an instruction retired via Pipeline 1 during phase B. Note if this signal is asserted at the same time as P0EXEB* then two instructions were retired simultaneously during phase B via pipelines 0 and 1 but there is no indication as to which specific instruction was retired via which pipeline. 0: An instruction was retired. 1: No instruction was retired. (6) JMPB* (Jump Phase B) Output
A jump was retired during phase B or a conditional branch instruction was retired and the branch was taken during phase B. Note that exceptions do not assert this signal. 0: Jump or conditional branch instruction was retired. 1: No Jump or conditional branch instruction was retired.
12-4
Chapter 12 PC Trace (7) TPCE* (Target PC Enable) Output
When this signal is asserted the TPC bus indicates the type of target PC that will be made available. 0: TPC bus indicates type of target PC. 1: TPC bus has either the target PC or the exception vector address code or has no information. The normal sequence of operation for the TPCE* and the TPC[3:0] signals is as follows: First TPCE* is asserted and simultaneously TPC[3:0] contains information about the type of the target PC (non-sequential PC). Next TPCE* is deasserted and either the target PC for indirect jumps is made available on the TPC[3:0] bus or for exceptions an exception vector address code is made available on the TPC[3:0] bus. (8) TPC[3:0] (Target PC) Output
TPC[3:0] either indicates the type of the target PC address or the target address of indirect jump instructions or exception vector address codes. TPC[3:0] when TPCE* is asserted TPC[3:0] TPCE When TPCE* is asserted the type of the target PC address is made available on TPC[3:0]. Each bit of TPC[3:0] indicates a different type and multiple bits can be active at the same time. * TPC[0]: Jump Target during Phase A When this signal is asserted it indicates that the target instruction of an Indirect Jump instruction (includes JR, JALR and ERET) is retired during Phase A. The target address is made available on TPC[3:0] in the next cycle if neither TPC[2] or TPC[3] are asserted simultaneously with this signal. * TPC[1]: Exception Target during Phase A When this signal is asserted it indicates that the first instruction of an exception handler is retired during Phase A. The exception vector address is made available on TPC[3:0] in the next cycle if neither TPC[2] nor TPC[3] are asserted simultaneously with this signal. * TPC[2]: Jump Target during Phase B When this signal is asserted it indicates that the target instruction of an Indirect Jump instruction is retired during Phase B. The target address is made available on TPC[3:0] in the next cycle. * TPC[3]: Exception Target during Phase B When this signal is asserted it indicates that the first instruction of an exception handler is retired during Phase B. The exception vector address is made available on TPC[3:0] in the next cycle.
12-5
Chapter 12 PC Trace TPC[3:0] TPCE* TPC[3:0] when TPCE is deasserted When TPCE* is not asserted TPC[3:0] can be carrying the following three type of information: 1. There is no meaningful information on TPC. This happens most of the time when the program is executing sequentially. 2. The target address is made available because in the previous cycle TPCE* was asserted and TPC[0] or TPC[2] were equal to 0. The target address starts with the least significant four bits of the target instruction address (bits[5:2]). 3. An exception vector address code is made available because in the previous cycle TPCE* was asserted and TPC[1] or TPC[3] were equal to 0. The exception vector address code are shown in Table 12-2.
Table 12-2. Exception Vector Address Codes
Exception
Reset, NMI TLB Miss TLB Miss TLB Miss TLB Miss Debug & SIO Debug & SIO Performance Counter Performance Counter Interrupt Interrupt Common Common
STATUS.BEV
x 1 0 1 0 x x x x 1 0 1 0
STATUS.DEV
x x x x x 1 0 1 0 x x x x
STATUS.EXL
x 0 0 1 1 x x x x x x x x
Vector Address
0xBFC0 0xBFC0 0x8000 0xBFC0 0x8000 0xBFC0 0x8000 0xBFC0 0000 0200 0000 0380 0180 0300 0100 0280
Code (TPC[3:0])
8 12 0 15 3 14 2 13 (1000) (1100) (0000) (1111) (0011) (1110) (0010) (1101)
0x8000 0080 0xBFC0 0x8000 0xBFC0 0x8000 0400 0200 0380 0180
1 (0001) 9 4 15 3 (1001) (0100) (1111) (0011)
12-6
Chapter 12 PC Trace
12.1.3 Priority of Target Addresses
The target address for an indirect jump instruction or an exception vector address code is made available on TPC[3:0]. For an indirect jump instruction it takes multiple cycles (8 BUSCLK cycles or 16 CPU clock cycles) for the complete target address to be made available on the TPC[3:0] bus. As such multiple conditions can occur simultaneously and there are certain priorities associated with putting out the target address. The rules governing what is made available on the TPC[3:0] bus are listed below: 1. If a new indirect jump instruction is retired while the target address PC for a previous indirect instruction is still being put out on TPC[3:0], the new indirect jump instruction's target PC will be signaled and start coming out on the TPC[3:0] bus and the previous target PC output will be terminated. 2. If an exception is taken while the target address PC for a previous indirect instruction is still being put out on TPC[3:0], the exception vector address code will be signaled and start coming out on the TPC[3:0] bus and the previous target PC output will be terminated The rules are also described in the following flowchart.
Exception
New Indirect Jump or Exception Target Retired ?
Indirect Jump
Yes
Previous Target Address. Is Being Output Currently ?
Previous Target address is Being Output Currently ?
Yes
Suspend Outputting Previous Target Address Output
No
No
Terminate Outputting Current PC Output
Output Exception Target
Output Exception Target
Start Outputting Target Address of Jump
Resume Outputting Previous Target Address
Figure 12-1. Priority of Outputting Jump or Exception Target
12-7
Chapter 12 PC Trace
12.1.4 Examples of PC Tracing
The following sections contains examples of program execution and the corresponding waveforms of the PC trace signals. Note that when two instructions are retired simultaneously, just for the sake of illustration, it is indicated which instruction is executed in which pipeline. In reality, in this case, it is not known which instruction is retired from which pipeline.
12-8
Chapter 12 PC Trace
12.1.4.1 Sequential Execution This is an example of sequential program execution. The program fragment is as follows: mul add sub lw r1 add sub ,,r1 add add The PC trace signals for the program fragment are shown below:
Phase CPUCLK A B A B A B A B
BUSCLK - - mul -
Pipe 0
mul
sub
add
add
Pipe 1
add
lw
sub
add
P0EXEA*
add
P1EXEA*
lw
sub
P0EXEB*
sub
add
P1EXEB*
add
add
JMPA*
JMPB*
TPCE*
TPC[3:0]
Figure 12-2. Waveform for Sequential Excecution
12-9
Chapter 12 PC Trace
12.1.4.2 Conditional Branch This is an example of program with conditional branch instructions. Both the branch taken and not taken case is illustrated. The program fragment is as follows:
add add beq lw add beq add .... add bne sll .... sub sub L0 L1 # Not Taken # Taken
L1:
L2
# Taken
L2:
The PC trace signals for the program fragment are shown below:
Phase CPUCLK A B A B A B A B A B
BUSCLK Taken bne
Pipe 0
add -
add
add
- - add
-
add
sub
Pipe 1
beq Not Taken
lw
beq Taken add
add
sll
sub
P0EXEA*
bne
P1EXEA*
lw
beq
sll
P0EXEB*
add
add
sub
P1EXEB*
beq
add
sub
JMPA*
beq
bne
JMPB*
TPCE*
TPC[3:0]
Figure 12-3. Waveform for Conditional Branch
12-10
Chapter 12 PC Trace
12.1.4.3 Indirect Jump (Target in Phase A) This is an example of program with an indirect jump instruction which is retired during phase B. The program fragment is as follows:
add add jr lw .... xor add ori ori sw sll sub sub L1
L1:
The PC trace signals for the program fragment are shown below:
Phase CPUCLK A B A B A B A B A B
BUSCLK Target xor
Pipe 0
add -
add
-
- - add
ori
sll
sub
Pipe 1
jr
lw
add
ori
sw
sub
P0EXEA*
xor
sll
P1EXEA*
lw
add
sw
P0EXEB*
add
ori
sub
P1EXEB*
jr
ori
sub
JMPA*
JMPB*
jr
TPCE*
xor
TPC[3:0] TA[x:y] = Target address bit x to y
1110
TA[5:2] 9 Bus Cycles
TA[31:30]
Figure 12-4. Waveform for Indirect Jump (Target in Phase A)
12-11
Chapter 12 PC Trace
12.1.4.4 Indirect Jump (Target in Phase B) This is an example of program with an indirect jump instruction which is retired during phase A. The program fragment is as follows:
add add jr lw .... xor add ori ori sw sll sub sub L1
L1:
The PC trace signals for the program fragment are shown below:
Phase CPUCLK A B A B A B A B A B
BUSCLK - - - - -
Pipe 0
add
ori
sll
sub
Pipe 1
jr
lw
xor Target
add
ori
sw
sub
P0EXEA*
add
sll
P1EXEA*
jr
add
sw
P0EXEB*
ori
sub
P1EXEB*
lw
xor
ori
sub
JMPA*
jr
JMPB*
TPCE*
xor
TPC[3:0]
1011
TA[5:2]
TA[9:6] 8 Bus Cycles
TA[31:30]
Figure 12-5. Waveform for Indirect Jump (Target in Phase B)
12-12
Chapter 12 PC Trace
12.1.4.5 Indirect Jump (During Target PC Output) This is an example of a program with two indirect jump instructions. While the target address PC associated with the first indirect jump instruction is being put out the second indirect jump instruction is retired. Thus the first target PC output is terminated and the second target PC output is signaled and then made available. The program fragment is as follows:
add add jr lw .... xor add jr add .... sw sll sub sub L1
L1:
L2
L2
The PC trace signals for the program fragment are shown below:
Phase CPUCLK A B A B A B A B A B A B
BUSCLK - - - add Target xor - - xor - - Target sll
Pipe 0
add -
add
jr
sub
Pipe 1
jr
lw
add
add
sw
sub
P0EXEA*
sll
P1EXEA*
lw
add
sw
P0EXEB*
add
jr
sub
P1EXEB*
jr
add
sub
JMPA*
JMPB*
jr
jr
TPCE*
xor
sw
TPC[3:0]
1110
TA[5:2]
1110
TA[5:2]
Figure 12-6. Waveform for Indirect Jump (During Target PC Output)
12-13
Chapter 12 PC Trace
12.1.4.6
Exception (Target in Phase B)
This is an example of a program which generates an exception. The target instruction (first instruction of the exception handler) retires in phase B. The program fragment is shown below. The label ExHnd identifies the first instruction of the exception handler.
add add add lw teq .... ExHnd: xor add sw sll sub sub
# Generates exception
The PC trace signals for the program fragment are shown below:
More stall cycles might be inserted. Phase CPUCLK A B A B A B A B A B
BUSCLK Exception Target sll xor add sw
Pipe 0 Pipe 1 P0EXEA*
add -
add add
- lw add
- -
- -
sub sub sll
P1EXEA*
lw
sw
P0EXEB*
add
xor
sub
P1EXEB*
add
add
sub
JMPA*
JMPB*
TPCE*
xor
TPC[3:0] E.Code = Exception Vector Code
0111
E.Code
Figure 12-7. Waveform for Exception (Target in Phase B)
12-14
Chapter 12 PC Trace
12.1.4.7 Exception (During Target PC Output) This is an example of a program which generates an exception while a target PC from an earlier indirect jump instruction is being made available. The target PC output is terminated and the exception vector address code is signaled and then made available. The target instruction (first instruction of the exception handler) retires in phase B. The program fragment is shown below. The label ExHnd identifies the first instruction of the exception handler.
add add add lw teq .... ExHnd: xor add sw sll sub sub
# Generates exception
The PC trace signals for the program fragment are shown below:
More stall cycles might be inserted. Phase CPUCLK BUSCLK Exception Target sll xor add sw A B A B A B A B A B
Pipe 0 Pipe 1 P0EXEA*
add -
add add
- lw add
- -
- -
sub sub sll
P1EXEA*
lw
sw
P0EXEB*
add
xor
sub
P1EXEB*
add
add
sub
JMPA*
JMPB*
TPCE*
xor
TPC[3:0]
TA13:10
TA17:14
TA21:18
0111
E.Code
TAxx:yy = Target Address bit xx to yy E.Code = Exception Vector Code
Figure 12-8. Waveform for Exception (During Target PC Output)
12-15
Chapter 12 PC Trace
12.1.4.8
Exception Generated by Branch or Jump Instruction
This is an example of a program in which an indirect jump instruction generates an exception. As such the program jumps to the exception handler and the only thing indicated is the exception vector address code and not the jump. The target instruction (first instruction of the exception handler) retires in phase B. The program fragment is shown below. The label ExHnd identifies the first instruction of the exception handler.
add add add lw jr nop .... ExHnd: xor add sw sll sub sub
# Generates an exception # Branch delay slot
The PC trace signals for the program fragment are shown below:
More stall cycles might be inserted. Phase CPUCLK A B A B A B A B A B
BUSCLK Exception Target sll xor add sw
Pipe 0 Pipe 1 P0EXEA*
add -
add add
- lw add
- -
- -
sub sub sll
P1EXEA*
lw
sw
P0EXEB*
add
xor
sub
P1EXEB*
add
add
sub
JMPA*
JMPB*
TPCE*
xor
TPC[3:0] E.Code = Exception Vector Code
0111
E.Code
Figure 12-9. Waveform for Exception Generated by Branch or Jump Instruction
12-16
Chapter 12 PC Trace
12.1.4.9 Exception Generated by Branch Delay Slot Instruction This is an example of a program in which the branch delay slot instruction generates an exception. As such the program jumps to the exception handler and the only thing indicated is the exception vector address code and not the jump. The target instruction (first instruction of the exception handler) retires in phase B. The program fragment is shown below. The label ExHnd identifies the first instruction of the exception handler.
add add add lw jr lw .... ExHnd: xor add sw sll sub sub
# Generates an exception
The PC trace signals for the program fragment are shown below:
More stall cycles might be inserted. Phase CPUCLK A B A B A B A B A B
BUSCLK Exception Target sll xor add jr sw
Pipe 0 Pipe 1 P0EXEA*
add -
add add
jr lw add
- -
- -
sub sub sll
P1EXEA*
lw
sw
P0EXEB*
add
xor
sub
P1EXEB*
add
add
sub
JMPA*
jr
JMPB*
TPCE*
xor
TPC[3:0] E.Code = Exception Vector Code
0111
E.Code
Figure 12-10. Waveform for Exception Generated by Branch Delay Slot Instruction
12-17
Chapter 12 PC Trace
12.1.4.10 Exception Generated by Target Instruction This is an example of a program in which the target instruction of an indirect jump generates an exception. As such the program jumps to the exception handler and the only thing indicated is the exception vector address code and not the jump. The target instruction (first instruction of the exception handler) retires in phase B. The program fragment is shown below. The label ExHnd identifies the first instruction of the exception handler.
add add add lw jr nop .... L1: lw and .... ExHnd: xor add sw sll sub sub
L1 # Generates an exception
The PC trace signals for the program fragment are shown below:
More stall cycles might be inserted. Phase CPUCLK A B A B A B A B A B A B
BUSCLK
Pipe 0
add -
add
jr
nop -
- -
- -
- -
xor
sll
sub
Pipe 1
add
lw
add
sw
sub
P0EXEA*
add
jr
sll
P1EXEA*
lw
sw
P0EXEB*
add
nop
xor
sub
P1EXEB*
add
add
sub
JMPA*
jr
JMPB*
TPCE*
xor
TPC[3:0]
0111
E.Code
Figure 12-11. Waveform for Exception Generated by Target Instruction
12-18
Chapter 12 PC Trace 12.1.4.11 Back to Back Exceptions (Case I) This is an example of a program in which two back to back exceptions are generated. The program jumps to the first exception handler but then immediately jumps to the second exception handler. The target instruction (first instruction of the second exception handler) retires in phase A. The exception vector address code for the first handler is never made available. The program fragment is shown below. The label ExHnd1 identifies the first instruction of the first exception handler and the label ExHnd2 identifies the first instruction of the second exception handler.
add add .... ExHnd1: xor xor .... ExHnd2: sw sll sub sub # Generates the first exception # Generates the second exception
The PC trace signals for the program fragment are shown below:
More stall cycles might be inserted. Phase CPUCLK A B A B A B A B A B A B
BUSCLK Exception Target sll sub sw sub sll
Pipe 0 Pipe 1 P0EXEA*
add -
- -
- - add
- -
- -
- -
- -
- -
P1EXEA*
sw
P0EXEB*
sub
P1EXEB*
sub
JMPA*
JMPB*
TPCE*
sw
TPC[3:0] E.Code = Exception Vector Code
1101
E.Code
Figure 12-12. Waveform for Back to Back Exceptions (Case I)
12-19
Chapter 12 PC Trace 12.1.4.12 Back to Back Exceptions (Case II) This is an example of a program in which two (all most) back to back exceptions are generated. The program jumps to the first exception handler and then generates an exception when executing the second instruction of the exception handler. It then jumps to the second exception handler. The target instruction (first instruction of the first exception handler) retires in phase A. As compared to the case discussed above the exception vector address code for the both the handlers are made available. The program fragment is shown below. The label ExHnd1 identifies the first instruction of the first exception handler and the label ExHnd2 identifies the first instruction of the second exception handler.
add add .... ExHnd1: xor xor .... ExHnd2: sw sll sub sub # Generates the second exception # Generates the first exception
The PC trace signals for the program fragment are shown below:
More stall cycles might be inserted. Phase CPUCLK A B A B A B A B A B A B
BUSCLK Exception Target Pipe 0 Pipe 1 P0EXEA* add - - - - - add - - xor - - - - - xor - - Exception Target sll sw sub sub sll
P1EXEA*
sw
P0EXEB*
sub
P1EXEB*
sub
JMPA*
JMPB*
TPCE*
xor
sw
TPC[3:0] E.Code = Exception Vector Code
1101
E.Code
1101
E.Code
Figure 12-13. Waveform for Back to Back Exceptions (Case II)
12-20
Chapter 13 Hardware Breakpoint
13.
Hardware Breakpoint
This chapter describes hardware breakpoint functions for debugging present on the C790.
13-1
Chapter 13 Hardware Breakpoint
13.1 Hardware Breakpoint
C790 provides hardware breakpoint mechanism for debugging purpose. (In this section, hardware breakpoint is sometimes referred to as "breakpoint".) This function allows users to set a instruction breakpoint and a data address/value breakpoint with signaling the breakpoint event occurrence to external probe. The following summarizes the features of the breakpoint function. * * * Provides both instruction and data breakpointing in virtual address. Instruction address breakpoint with address masking. Data breakpoint with masking. Data breakpoint can be set by the following events: Address with masking Value with masking Read/write * * * Independent exception event control for instruction and data. Individual event control by processor operating mode/exception level. Provides a trigger signal to external probes synchronized with the breakpointing event.
Hardware breakpointing is implemented as a part of Coprocessor 0. Configuring the breakpoint is done by setting 7 Breakpoint registers by special MTC0/MFC0 instructions. Figure 13-1 shows the basic structure of the breakpoint hardware. Breakpoint can generate breakpoint exception which is categorized in Level2 exception, and has a dedicated exception vector. (See 5. Exception) This exception is only masked in Level2 mode, and exception generation itself can be controlled by the Breakpoint Control Register mentioned in the following section. Note that some of breakpoint exceptions are imprecise, for instance, setting value breakpoint for load instruction is basically imprecise because the load instruction may retire from the pipeline before actual acquisition of memory contents. The following summarizes imprecise cases: * * All data value breakpoint on load instruction Data value breakpoint on SWC1 instruction
13.1.1 Hardware Breakpoint signal
To signal a breakpoint occurrence, the C790 activates a signal called TRIG, whenever a trigger condition is met. * TRIG (Trigger Output) Output This signal is asserted for two BUSCLK cycles when a trigger condition is met.
13-2
Chapter 13 Hardware Breakpoint
Address / Value Register
IAB DAB DVB IABM DABM DVBM
fetch PC load/store address load/store value
Mask Register Mask
Mask
=? Trigger to external probe (TRIG*)
Breakpoint Control BPC
Enable Ctrl.
Breakpoint Event
Enable Ctrl.
Exception
Pipeline Control (Exception Control)
Figure 13-1. Overall Structure of Hardware Breakpoint
13.2 Breakpoint Registers
Hardware breakpoint is comprised of 3 pairs of breakpoint registers and one control register listed below. Each of breakpoint register pair includes one breakpoint value register and one breakpoint mask register. * *
Breakpoint Control Register (BPC)
Instruction Address Breakpoint Registers
Instruction Address Breakpoint Register (IAB) Instruction Address Breakpoint Mask Register (IABM)
* Data Address Breakpoint Registers
Data Address Breakpoint Register (DAB) Data Address Breakpoint Mask Register (DABM)
* Data Value Breakpoint Registers
Data Value Breakpoint Register (DVB) Data Value Breakpoint Mask Register (DVBM)
13-3
Chapter 13 Hardware Breakpoint All 7 registers are 32-bit read/write and assigned to Coprocessor0 register 24. Therefore, C790 provides extended MTC0 instructions for accessing these registers and it is necessary to use these instructions to access these registers instead of the conventional MTC0/MFC0 instructions. Table 13-1 and Table 13-2 summarizes the instructions for accessing the registers.
Table 13-1. Set a new value into breakpoint registers
Mnemonic
MTBPC MTIAB MTIABM MTDAB MTDABM MTDVB MTDVBM
Operation
Move to Breakpoint Control Register Move to Instruction Address Breakpoint Register Move to Instruction Address Breakpoint Mask Register Move to Data Address Breakpoint Register Move to Data Address Breakpoint Mask Register Move to Data Value Breakpoint Register Move to Data Value Breakpoint Mask Register
Table 13-2. Get the value from breakpoint registers
Mnemonic
MFBPC MFIAB MFIABM MFDAB MFDABM MFDVB MFDVBM
Operation
Move from Breakpoint Control Register Move from Instruction Address Breakpoint Register Move from Instruction Address Breakpoint Mask Register Move from Data Address Breakpoint Register Move from Data Address Breakpoint Mask Register Move from Data Value Breakpoint Register Move from Data Value Breakpoint Mask Register
13.2.1 Breakpoint Control Register (BPC)
The BPC register contains enable bits and status bits for controling the breakpointing of both instruction and data. This register consists of 5 parts of bit fields: * *
Breakpoint overall control (bit [31:28]) These bits controls the operation mode of the breakpointing. Instruction breakpoint control (bit [26:23]) These bits specifies the processor mode that the instruction breakpoint is enabled. Data breakpoint control (bit[21:18]) These bits specifies the processor mode that the data breakpoint is enabled. Signaling Control (bit[17:15]) These bits controls the occurrence of breakpoint exception / trigger generation upon the breakpoint event. Breakpoint Status (bit[2:0]) These bits indicates the type of breakpoint event. This part is used to identify which breakpoint event occurred in the breakpoint exception handler.
* *
*
13-4
Chapter 13 Hardware Breakpoint The following shows the detailed bitmap of BPC register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 I DDD IIII DDDD I DB DD I A RW V 0 U S K X 0 U S K X T T E 0 0 0 0 0 0 0 0 0 0 0 0 W R A EEEE EEEE EEEEEED BBB
Table 13-3 describes the BPC register fields.
Table 13-3. BPC Register Fields
Field
IAE
Bits
31
Description
Instruction Address Enable. This bit enables/disables instruction address breakpointing. 0: disable instruction address breakpointing 1: enable instruction address breakpointing Data Read Enable. This bit enables data load address breakpointing. 0: disable breakpointing on reads 1: enable breakpointing on reads Data Write Enable. This bit enables data store address breakpointing. 0: disable breakpointing on writes 1: enable breakpointing on writes Data Value Enable. This bit is valid only when DRE and/or DWE are set to 1. When DVE is set to 1 data read breakpoints (DRE == 1) are further qualified by the value of the data read, and data write breakpoints (DWE == 1) are further qualified by the value of the data written. Note that data value breakpoints for data reads are imprecise. See section 13.1 ("Hardware Breakpoint") for more details. Reserved - must be written as zeros by software. The processor returns zeros in these bit positions when read. Instruction break - User Enable. This bit enables instruction address breakpointing in (standard) user mode. This bit is only valid if IAE is set to 1. 0: disable instruction address breakpointing in User mode 1: enable instruction address breakpointing in User mode Instruction break - Supervisor Enable. This bit enables instruction address breakpointing in supervisor mode. This bit is only valid if IAE is set to 1. 0: disable instruction address breakpointing in Supervisor mode 1: enable instruction address breakpointing in Supervisor mode Instruction break - Kernel Enable. This bit enables instruction address breakpointing in non-exception kernel mode - i.e. when both STATUS.EXL and STATUS.ERL are 0. This bit is only valid if IAE is set to 1. 0: disable instruction address breakpointing in Kernel mode 1: enable instruction address breakpointing in Kernel mode Instruction break - EXL mode Enable. This bit enables instruction address breakpointing in exception kernel mode - i.e. when STATUS.EXL is 1 and STATUS.ERL is 0. This bit is only valid if IAE is set to 1. 0: disable instruction address breakpointing in EXL mode 1: enable instruction address breakpointing in EXL mode Reserved - must be written as zeros by software. The processor returns zeros in these bit positions when read.
Type
Read / Write
Initial Value
0
DRE
30
Read / Write Read / Write Read / Write
0
DWE
29
0
DVE
28
Undefined
rsvd IUE
27 26
Read Read / Write
0 Undefined
ISE
25
Read / Write
Undefined
IKE
24
Read / Write
Undefined
IXE
23
Read / Write
Undefined
rsvd
22
Read
0
13-5
Chapter 13 Hardware Breakpoint
Field
DUE
Bits
21
Description
Data break - User Enable. This bit enables data breakpointing in User mode. This bit is only valid if DWE or DRE is set to 1. 0: disable data breakpointing in User mode 1: enable data breakpointing in User mode Data break - Supervisor Enable. This bit enables data breakpointing in Supervisor mode. This bit is only valid if DWE or DRE is set to 1. 0: disable data breakpointing in Supervisor mode 1: enable data breakpointing in Supervisor mode Data break - Kernel Enable. This bit enables data breakpointing in Kernel mode - i.e. when both STATUS.EXL and STATUS.ERL are 0. This bit is only valid if DWE or DRE is set to 1. 0: disable data breakpointing in Kernel mode 1: enable data breakpointing in Kernel mode Data break - EXL mode Enable. This bit enables data breakpointing in Exception Kernel mode - i.e. when STATUS.EXL is 1 and STATUS.ERL is 0. This bit is only valid if at least one of DRE or DWE are set to 1. 0: disable data breakpointing in EXL mode 1: enable data breakpointing in EXL mode Instruction Trigger Enable. This bit enables the generation of the trigger signal when an instruction breakpoint occurs. 0: disable instruction breakpoint trigger 1: enable instruction breakpoint trigger Data Trigger Enable. This bit enables the generation of the trigger signal when an data breakpoint occurs. 0: disable data breakpoint trigger 1: enable data breakpoint trigger Breakpoint Exception Disable. This bit disables the entry into the debug exception handler. Note that the setting of this bit does not affect trigger signal generation. 0: enable entry into debug exception handler 1: disable entry into debug exception handler Reserved - must be written as zeros by software. The processor returns zeros in these bit positions when read. Data Write Breakpoint. This status bit indicates whether a data breakpoint has occurred on a write or not. 0: no data breakpoint has occurred on a write 1: data breakpoint has occurred on a write Data Read Breakpoint. This status bit indicates whether a data breakpoint has occurred on a read or not. 0: no data breakpoint has occurred on a read 1: data breakpoint has occurred on a read Instruction Address Breakpoint. This status bit indicates whether an instruction address breakpoint has occurred or not. 0: no instruction address breakpoint has occurred on a read 1: instruction address breakpoint has occurred on a read
Type
Read / Write
Initial Value
Undefined
DSE
20
Read / Write
Undefined
DKE
19
Read / Write
Undefined
DXE
18
Read / Write
Undefined
ITE
17
Read / Write
Undefined
DTE
16
Read / Write
Undefined
BED
15
Read / Write
Undefined
rsvd DWB
14 - 3 2
Read Read / Write
0 Undefined
DRB
1
Read / Write
Undefined
IAB
0
Read / Write
Undefined
13-6
Chapter 13 Hardware Breakpoint
13.2.2 Instruction Address Breakpoint Register (IAB) / Instruction Address Breakpoint Mask Register (IABM)
31 IAB 21 0 0
Figure 13-2. Instruction Address Breakpoint Register
31 IABM
21 0 0
Figure 13-3. Instruction Address Breakpoint Mask Register
This register pair holds the instruction breakpointing address. Both the value in IAB register and the current fetch PC are masked by the value in IABM. If the values are equal, condition for instruction address breakpoint becomes true. As fetch PC is always word-aligned, the bit 0 and bit 1 of these registers are fixed to zeros.
13.2.3 Data Address Breakpoint Register (DAB) / Data Address Breakpoint Mask Register (DABM)
This register pair holds the data breakpointing address. Both the value in DAB register and the destination for load/store operation are masked by the value in DABM. If the values are equal, condition for data address breakpoint becomes true. These registers are 32-bit wide readable/writable.
31 DAB
0
Figure 13-4. Data Address Breakpoint Register
31 DABM
0
Figure 13-5. Data Address Breakpoint Mask Register
13-7
Chapter 13 Hardware Breakpoint
13.2.4 Data Value Breakpoint Register (DVB) / Data Value Breakpoint Mask Register (DVBM)
This register pair holds the value for data value breakpointing. Both the value in DVB and the lower 32 bits of load/store data are masked with the value in DVBM. If the values are equal, condition for data value breakpoint becomes true. Note that enabling data value breakpoint implies activating the data address breakpointing (setting either/both of DRE/DWE bit in BPC), and therefore breakpoint event for data value only happens if both condition for data address breakpoint and data value breakpoint becomes true. Note that the comparison of data value is always performed in 32bit regardless of the width of load/store operation: the store value comes from GPR is truncated to 32bit value for comparison and the load value is appropriately signextended or merged with the contents of GPR (unaligned cases) and then the least significant 32-bits are used for comparison. For instance, most significant (64+32) bits/32-bits are truncated on data value comparison for LQ/SQ/LD/SD instructions, while the value from memory is sign-extended to comprise a 32bit value for LB/LH instructions.
13.3 Setting Breakpoint
The following sections mention the details of breakpoint controls with some sample codes. As C790 is a pipelined superscalar processor, several restrictions are applied in setting breakpoint registers. The following is the main topic that has to be taken care of:
31 DVB 0
Figure 13-6. Data Value Breakpoint Register
31 DVBM
0
Figure 13-7. Data Value Breakpoint Mask Register
*
Upon chainging the configuration of breakpointing, it is very likely that 3 or more registers must be updated. However, the change is performed in pipelined manner as C790 is pipelined processor. This potentially has possibility to create a hazardous area in generating exception unconsciously. C790 does NOT wait for the data arrival on load operation. The instruction itself may retire from the pipeline before storing the data into the registers, and the occurrence of breakpointing event delays from the instruction completion. This not only make some data value breakpoints imprecise, but also temporally masks an occurrence of breakpointing event as following case: a data load instruction that should cause data value breakpoint exception results in cache miss. But in the next cycle, other level2 exception such as SIO interrupt had been detected and the processor entered level2 before the acquisition of the data. Under this scenario, data value exception will be delayed until the processor returns from Level2 mode. 13-8
*
Chapter 13 Hardware Breakpoint
13.3.1 Sequence of Setting Breakpoint
In order to prevent spurious exception during reconfiguring the breakpoint, managing breakpointing enable before and after the change is mandatory. One easy way is to change the processor mode into Level2 to mask breakpoint exception unconditionally, but, this has an side effect that the user segment becomes unmapped. Therefore, this section mainly focuses on changing the configuration without changing the processor mode. The following summarizes the sequence of changing breakpointing configuration. 1. Synchronize the pipeline 2. Disable the breakpoint exception that is going to be reconfigured 3. Synchronize the pipeline 4. Set appropriate data in Breakpoint register pairs 5. Set appropriate configuration into Breakpoint Control Register, including enabling the break point exception. 6. Synchronize the pipeline There are three synchronization points in the sequence: the first one is to ensure that there is no pending breakpoint exception for consistency in the breakpoint exception handler. The second one is right after disabling the breakpoint that is going to be reconfigured. This separates the change in the control register from the change for other breakpoint register so that programmer can safely change the breakpoint. The third synchronization is after updating breakpoint control register. Since C790 issues the instructions in in-ordered manner, changes for breakpoint register pair always precedes the change in the control register. In this sense, there is no spurious exception without this synchronization. However, in order to catch the breakpointing event right after updating the control register, flushing the pipeline at this point is strongly recommended. The first synchronized operation must be either of SYNC.P or SYNC.L operation depending on the breakpoint that is going to be reconfigured. If it is instruction breakpoint, SYNC.P is to be used and otherwise SYNC.L is to be used. For second and third synchronization, SYNC.P is to be used. The flow generating TRIG* and exception is shown in Figure 13-8, Figure 13-9, Figure 13-10. Figure 13-8 describes the flow hardware breakpoint encounts the breakpointing event. Figure 13-9, and Figure 13-10 describe the flow how the exception and TRIG* signal is asserted. The following shows some simple sample codes for configuring breakpoint registers. Several programming notes/issues are put in the comments.
13-9
Chapter 13 Hardware Breakpoint
Start
Status.ERL
In Level2 Mode ?
1 (Level2)
No Breakpoint Event
Breakpointing Configuration Check
Status.EXL
In Level1 Mode ?
1 (Level1)
Status.KSU (2bits) Supervisor (01b) Processor Mode ? User (10b) No
Kernel (00b)
I/DSE = ? Yes
I/DUE = ? Yes
No
I/DKE = ? Yes
No
I/DXE = ? Yes
No
No Breakpoint Event
No Breakpoint Event
No Breakpoint Event
No Breakpoint Event
No Breakpoint Event
Checking Breakpoint Event
Figure 13-8. Hardware Breakpoint detection flow (Setting)
13-10
Chapter 13 Hardware Breakpoint
Checking Breakpoint Event
Mask Instruction address
Mask Value in IAB
Checking Breakpoint Event (Instruction) No Breakpoint Event
Check Condition
Equal ? Yes
No
IAE = 1 ? Yes IAB = 1
No
No Breakpoint Event
Signal Breakpoint
BPC.ITE = 1 ?
Signal External Trigger ? Yes Assert TRIG*
No
BPC.BED = 1 ?
Generate No Exception ? Yes
(End)
Breakpoint Exception
Figure 13-9. Hardware Breakpoint detection flow (IAB)
13-11
Chapter 13 Hardware Breakpoint
Checking Breakpoint Event
Mask Data address
Mask Value in DAB
Checking Breakpoint Event (Data) No Breakpoint Event
Check Condition (Address)
Equal ? Yes
No
BPC.DVE = 1 ? Check Value Also ? Yes Mask Data Value Mask Value in DVB
No
Check Condition
Equal ? Yes
No
No
Read ?
Signal Yes Breakpoint No DWE = 1 ? Yes DWB = 1 DRE = 1 ? Yes DRB = 1 No No Breakpoint Event
Figure 13-10. Hardware Breakpoint detection flow (DAB/DVB) (1/2)
13-12
Chapter 13 Hardware Breakpoint
BPC.ITE = 1 ?
Signal External Trigger ? Yes Assert TRIG*
No
No Breakpoint Event
BPC.BED = 1 ?
Generate No Exception ?
(End)
Yes Breakpoint Exception
Figure 13-10. Hardware Breakpoint detection flow (IAB) (2/2)
13-13
Chapter 13 Hardware Breakpoint
13.3.2 Instruction Breakpointing
The following code sets an instruction breakpoint from 0x1234_5600 to 0x1234_56ff, and traps if the processor is either in user mode or in supervisor mode.
-----------------------------------------------------------------# # Setting Instruction address breakpoint from 0x1234_5600 to 0x1234_56ff # in user mode and supervisor mode # # 1st sync. sync.p # A barrier to ensure there is no pending # instruction address breakpoint in pipe. # pipeline flusing works for this purpose. # At first, disable instruction breakpointing to avoid spurious exceptions. # The following uses conservative way not to break the configuration for # data breakpointing. # mfbpc $4 # get the value in BPC bgez $4, 1f # skip following if ( BPC[31] == 0 ) nop # (bds) li $5, (1 << 31) # IAE is in 31st bit of BPC xor $4, $5, $4 # Resetting IAE bit to zero. mtbpc $4 # reload BPC. # 2nd sync. sync.p 1: # # Reconfigure instruction breakpoint address. # Note that least significant 8 bits can be anything because it is masked # by IABM register anyway # li $4, 0x12345678 mtiab $4 # # Setting mask register. Masked if corresponding bit in mask register # is reset to zero. # li $5, 0xffffff00 mtiabm $5 # # Reconfigure instruction breakpoint. For better understanding, once # resetting all the bits for instructio breakpoint, and then sets new # config. # mfbpc $4 # # Reset IUE/ISE/IKE/ITE/IAB. Especially resetting IAB is important to # know the cause of next breakpoint exception correctly. # li $5, ~( \ ( 1 << 26 ) # IUE \ | ( 1 << 25 ) # ISE \ | ( 1 << 24 ) # IKE \ | ( 1 << 23 ) # IXE \ | ( 1 << 17 ) # ITE \ | ( 1 << 0 ) # IAB \ ) and $4, $4, $5 # # Set new configuration to BPC register. # Note that setting BPC after IAB/IABM is so important to avoid spurious # exception. # # barrier to ensure the configuration change # of breakpoint function
13-14
Chapter 13 Hardware Breakpoint
li $6, $6, ( ( 1 << | ( 1 << | ( 1 << | ( 1 << ) $5, $4, $6 $5 31 26 20 15 ) ) ) ) # # # # IAE IUE IUE BED = = = = 1 1 1 1 to to to to enable enable enable enable \ \ Inst. B.P. \ Inst. B.P in user mode. \ Inst. B.P in supv. mode. \ generating exception. \
or mtbpc
# 3rd sync. Sync.p # Barrier to ensure the configuration change ------------------------------------------------------------------
13-15
Chapter 13 Hardware Breakpoint
13.3.3 Data Address Breakpointing
The following code sets a data address breakpoint from 0x1230_0000 to 0x1233_ffff for both reading and writing, and traps if the processor is either in kernel mode(including under level1).
-----------------------------------------------------------------# # Setting data address breakpoint from 0x1230_0000 to 0x1233_ffff # in kernel(normal,L1) mode # # 1st sync. sync.l # A barrier to ensure there is no pending # data address breakpoint in pipe. # Must flush all buffers for load/store for this # purpose by SYNC.L # # At first, reset data-breakpoint related bits to zeros. # Resetting DWB/DRB is important so that the hander can recognize the # next breakpoint exception correctly. # mfbpc $4 # load current configuration li $5, ~( \ ( 1 << 30 ) # DRE \ | ( 1 << 29 ) # DWE \ | ( 1 << 28 ) # DVE \ | ( 1 << 21 ) # DUE \ | ( 1 << 20 ) # DSE \ | ( 1 << 19 ) # DKE \ | ( 1 << 18 ) # DXE \ | ( 1 << 16 ) # DTE \ | ( 1 << 2 ) # DWB \ | ( 1 << 1 ) # DRB \ ) and $4, $4, $5 mtbpc $4 # reload BPC. # 2nd sync. sync.p # barrier to ensure the configuration change # of breakpoint function
# # Reconfigure data breakpoint address. # Note that least significant 18 bits can be anything because it is masked # by DABM register anyway # li $6, 0x12305678 mtdab $6 # # Setting mask register. Masked if corresponding bit in mask register # is reset to zero. # li $5, 0xfffc0000 mtdabm $5 # # Set new configuration to BPC register. # Note that setting BPC after DAB/DABM is so important to avoid spurious # exception. # li $6, $6, ( ( 1 << 30 ) # DRE = 1 to enable Data B.P on read | ( 1 << 29 ) # DWE = 1 to enable Data B.P on write | ( 1 << 19 ) # DKE = 1 to enable Data B.P in kern. mode. | ( 1 << 18 ) # DXE = 1 to enable Data B.P under L1. | ( 1 << 15 ) # BED = 1 to enable generating exception. ) or $5, $4, $6 # Note that $4 still holds the value used # on MTBPC. mtbpc $5
\ \ \ \ \ \ \
13-16
Chapter 13 Hardware Breakpoint
# 3rd sync. sync.p # Barrier to ensure the configuration change ------------------------------------------------------------------
13-17
Chapter 13 Hardware Breakpoint
13.3.4 Breakpointing by Data Address and Value
Setting Data Address and Value breakpoint is the same as Data Address breakpoint. The following example is the same as the previous example except in that the trap only happens if the data contains 0xCAFE in least significant 16 bits, and traps only on loading data.
-----------------------------------------------------------------# # Setting data address/value breakpoint from 0x1230_0000 to 0x1233_ffff # with data that contains 0xCAFE in kernel(normal, L1) mode. # # 1st sync. sync.l # A barrier to ensure there is no pending # data address breakpoint in pipe. # Must flush all buffers for load/store for this # purpose by SYNC.L # # At first, reset data-breakpoint related bits to zeros. # Resetting DWB/DRB is important so that the hander can recognize the # next breakpoint exception correctly. # mfbpc $4 # load current configuration li $5, ~( \ ( 1 << 30 ) # DRE \ | ( 1 << 29 ) # DWE \ | ( 1 << 28 ) # DVE \ | ( 1 << 21 ) # DUE \ | ( 1 << 20 ) # DSE \ | ( 1 << 19 ) # DKE \ | ( 1 << 18 ) # DXE \ | ( 1 << 16 ) # DTE \ | ( 1 << 2 ) # DWB \ | ( 1 << 1 ) # DRB \ ) and $4, $4, $5 mtbpc $4 # reload BPC. # 2nd sync. sync.p # barrier to ensure the configuration change # of breakpoint function
# # Reconfigure data breakpoint address. # Note that least significant 18 bits can be anything because it is masked # by DABM register anyway # li $6, 0x1233ffff mtdab $6 # # Setting mask register. Masked if corresponding bit in mask register # is reset to zero. # li $5, 0xfffc0000 mtdabm $5 # # Configure data value address. # Note that least significant 8 bits can be anything because it is masked # by DVBM register anyway # li $6, 0xbabecafe mtdvb $6 # # Setting mask register. Masked if corresponding bit in mask register # is reset to zero. # li $5, 0x0000ffff mtdvbm $5
13-18
Chapter 13 Hardware Breakpoint
# # Set new configuration to BPC register. # Note that setting BPC after DAB/DABM is so important to avoid spurious # exception. # li $6, ( ( 1 << 30 ) # DRE = 1 to enable Data B.P on read | ( 1 << 28 ) # DVE = 1 to enable Data value B.P | ( 1 << 19 ) # DKE = 1 to enable Data B.P in kern. mode. | ( 1 << 18 ) # DXE = 1 to enable Data B.P under L1. | ( 1 << 15 ) # BED = 1 to enable generating exception. ) or $5, $4, $6 # Note that $4 still holds the value used # on MTBPC. mtbpc $5 # 3rd sync. sync.p # Barrier to ensure the configuration change ------------------------------------------------------------------
\ \ \ \ \ \ \
13.3.5 Data Value Breakpointing
Data value breakpoint can be configured so that it traps only by data value, by setting zero to DABM register and configuring the data breakpoint to "Data Address and Value" mode.
13-19
Chapter 13 Hardware Breakpoint
13.4 Triggering External Probes
There is one dedicated pad to make breakpoint visible outside of C790. This pad, TRIG* signal, is asserted for two cycles whenever break point event is detected. This trigger signal generation is enabled by setting ITE/DTE bit in BPC register to 1. Note that assertion of TRIG* signal is not completely synchronized with the occurrence of exception: TRIG signal is directly connected to the internal breakpoint detect logic while exception including breakpoint always occurs along with retirement of instruction. Threfore, thiming of the assertion of TRIG* signal and that of occurrence of exception may differs. Especially, if the breakpoint is detected right before entering Level2 mode, and if the breakpoint exception is taken imprecisely, exception may be masked because of processor's mode change although TRIG* signal has already been asserted.
13.5 Important notice on using hardware breakpoint
One important issue not mentioned in this section is that breakpointing does not take care of ASID on detecting breakpoint. This implies not only that software has to take care of it on context switching to apply breakpointing for a specific process, but also that imprecise breakpoint exception may be detected after or in the middle of context switching. In such condition, it may become difficult to identify which process the breakpoint exception belongs to. This can be avoided by executing SYNC.L instruction right before changing ASID. (Since all imprecise breakpoint events relates to load/store instructions, executing SYNC.L works as a barrier) Relating to this issue, as briefly described in section 13.3, issuing breakpoint exception may delay because of other level2 exception handling, although the breakpoint exception is actual precedent from instruction ordering point of view. In such condition, because C790 generates breakpoint exception after the processor returns from Level2,1 there is no possibility to miss encounting the breakpoint. However, if the program need to insure the order of occurrence between level2 exceptions, software has to take care of it (i.e. all level2 handler has to check the occurrence of breakpointing first). Similarly, if a level2 exception DOES NOT return to where the exception was detected, software has to insure to reset the condition of breakpoint.
1
C790 tracks the occurrence of breakpoint exception until the breakpoint exception is taken.
13-20
Index
INDEX
A
ABS.............................................................................................................................................. 2-18, 11-6, D-4 ABS.fmt.................................................................................................................................... 3-21, 10-14, D-41 AbsoluteValue .................................................................................................................................................D-4 ADD .......................................................................................................................2-18, 3-15, 5-26, A-11, A-141 ADD. ...............................................................................................................................................................D-5 ADD.fmt ................................................................................................................................... 3-21, 10-14, D-41 ADDI ............................................................................................... 3-14, 5-26, A-12, A-141, B-163, C-41, D-40 ADDIU.............................................................................................3-14, A-12, A-13, A-141, B-163, C-41, D-40 AddressError......................................................................... A-58, A-67, A-68, A-70, A-79, A-94, A-103, A-116 ADDU..............................................................................................................................3-15, A-11, A-14, A-141 AdEL .............................................................................................................................................4-20, 5-8, 5-15 AdES.............................................................................................................................................4-20, 5-8, 5-15 AGNT ...................................................................................................................................8-5, 8-11, 8-14, 8-15 alignment ............. 2-7, 2-16, 3-8, 6-1, A-2, A-6, A-7, A-60, A-64, A-72, A-76, A-95, A-99, A-117, A-121, B-10, B-162 ALU ................................................................................................................... 2-3, 2-10, 2-11, 2-12, 2-13, 3-14 AND ................................................................ 3-14, 3-15, 3-25, A-3, A-15, A-16, A-141, B-4, B-48, C-39, C-40 ANDI ........................................................................................................ 3-14, A-16, A-141, B-163, C-41, D-40 arbiter............................................................................................................................................8-2, 8-14, 8-15 AREQ.......................................................................................................................................... 8-11, 8-14, 8-15 ASID.......... 2-15, 4-5, 4-8, 4-14, 5-16, 5-17, 5-18, 6-2, 6-3, 6-4, 6-9, 6-10, 6-12, 6-13, 6-16, 6-18, 13-20, C-38 Associativity .................................................................................................................................................. 2-17
B
BadPAddr.......................................................................................................... 2-15, 4-5, 4-17, 4-25, 5-19, 8-25 BadVAddr......................................................................................... 2-15, 4-5, 4-9, 4-12, 5-15, 5-16, 5-17, 5-18 BadVPN2 ........................................................................................................................................................ 4-9 BC0 .....................................................................................................................................................C-41, C-42 BC0F..................................................................................................................................3-20, C-2, C-41, C-42 BC0FL..........................................................................................................................................3-20, C-3, C-42 BC0T............................................................................................................................................3-20, C-4, C-42 BC0TL..........................................................................................................................................3-20, C-5, C-42 BC1 ...............................................................................................................................................................D-40 BC1F........................................................................................................................ 3-21, 10-15, D-6, D-8, D-40 BC1T........................................................................................................................ 3-21, 10-15, D-7, D-8, D-40 BD2 ................................................................................................ 4-19, 4-33, 5-5, 5-12, 5-13, 5-14, 5-25, 9-10
X-1
Index
BdPAddr........................................................................................................................................................ 4-25 BDS.................................................................................................................................................4-29, 9-6, 9-8 BE ................................................................................................................................................................. 4-23 BED............................................................................................................................. 13-6, 13-15, 13-16, 13-19 BEM ..................... 4-16, 4-17, 4-25, 5-9, 5-11, 5-19, 8-25, A-61, A-62, A-65, A-66, A-73, A-74, A-77, A-78, A-97, A-98, A-101, A-102, A-119, A-120, A-123, A-124 BEQ ......................................................................................................... 3-17, A-17, A-141, B-163, C-41, D-40 BEQL ....................................................................................................... 3-17, A-18, A-141, B-163, C-41, D-40 BEV...................... 4-16, 4-17, 5-7, 5-11, 5-12, 5-15, 5-16, 5-17, 5-18, 5-19, 5-20, 5-21, 5-22, 5-23, 5-24, 5-26, 5-27, 5-28, 12-6 BFH.................................................................................................................................................................C-6 BGEZ ....................................................................................................................................... 3-18, A-19, A-142 BGEZAL................................................................................................................................... 3-18, A-20, A-142 BGEZALL................................................................................................................................. 3-18, A-21, A-142 BGEZL ..................................................................................................................................... 3-18, A-22, A-142 BGTZ ........................................................................................................3-17, A-23, A-141, B-163, C-41, D-40 BGTZL ......................................................................................................3-17, A-24, A-141, B-163, C-41, D-40 BHINBT...........................................................................................................................................................C-6 BHT........................................................................................................................ 1-2, 2-3, 2-6, 2-7, 4-31, C-10 BIU .................................................................................................................................................................. 2-4 BLEZ .........................................................................................................3-17, A-25, A-141, B-163, C-41, D-40 BLEZL ...................................................................................................... 3-17, A-26, A-141, B-163, C-41, D-40 BLTZ ........................................................................................................................................ 3-18, A-27, A-142 BLTZAL .................................................................................................................................... 3-18, A-28, A-142 BLTZALL .................................................................................................................................. 3-18, A-29, A-142 BLTZL ...................................................................................................................................... 3-18, A-30, A-142 BNE.......................................................................................................... 3-17, A-31, A-141, B-163, C-41, D-40 BNEL........................................................................................................ 3-17, A-32, A-141, B-163, C-41, D-40 bootstrapping .................................................................................................................................................5-11 BPC.........................................................4-26, 5-11, 13-3, 13-4, 13-5, 13-8, 13-14, 13-16, 13-18, 13-19, 13-20 BPE.............................................................................................................................................. 4-23, 5-11, C-9 BR ........................................................................................................................................2-3, 2-11, 2-12, 3-26 branch likely......................................................................................................................................... 2-13, 9-10 BREAK....................................................................... 2-11, 3-18, 5-10, 5-21, 9-7, A-33, A-39, A-141, B-8, B-67 breakpoint ............ 1-2, 2-19, 3-18, 5-10, 5-11, 5-14, 5-19, 12-1, 13-1, 13-2, 13-3, 13-4, 13-6, 13-7, 13-8, 13-9, 13-14, 13-16, 13-18, 13-19, 13-20, A-33 breakpoints ......................................................................................................................... 12-1, 13-5, 13-8, A-2 BTAC...................................1-2, 2-3, 2-6, 2-7, 4-29, 4-31, 9-6, 9-7, 9-8, C-6, C-7, C-9, C-10, C-11, C-13, C-28 BUSERR ................................................................................................5-19, 8-10, 8-25, 8-26, 8-27, 8-28, 8-29 BXLBT.............................................................................................................................................................C-6
X-2
Index
BXSBT ............................................................................................................................................................C-6
C
C.cond.D .........................................................................................................................................................D-8 C.cond.fmt ............................................................................................................................... 3-21, 10-15, D-41 C.cond.fmt. ................................................................................................................................... D-6, D-7, D-41 C.cond.S .........................................................................................................................................................D-8 Cache................... 1-2, 2-1, 2-3, 2-6, 2-7, 2-15, 2-17, 2-18, 3-20, 4-5, 4-17, 4-29, 8-2, 8-8, 9-7, 9-9, A-6, A-7, C-6, C-7, C-8, C-9, C-13 CACHE ................ 2-11, 2-13, 2-17, 3-20, 4-17, 4-23, 4-31, 4-32, 5-19, A-141, B-163, C-6, C-7, C-8, C-9, C-10, C-11, C-12, C-13, C-41, D-40 CacheOp.........................................................................................................................................................C-7 CAUSE................................................................................................................................................. 8-13, 9-10 CCR ............................................................................................................................... 9-2, 9-5, 9-10, 9-11, A-3 CE ....................................................................................................................................... 4-19, 4-23, 5-2, 5-23 CEIL. .............................................................................................................................................................D-12 CEIL.L.fmt................................................................................................................................ 3-21, 10-14, D-41 CEIL.W..........................................................................................................................................................D-13 CEIL.W.fmt............................................................................................................................... 3-21, 10-14, D-41 CFC1..................................................................................................................... 3-21, 10-13, 11-9, D-14, D-40 CH........................................................................................................................................................ 4-16, 4-17 coherency ........................................................................................................... 2-18, 4-8, 4-24, 6-12, 6-16, 8-2 Coherency..................................................................................................................................................... 6-17 Config.......................................................................................................... 2-15, 4-5, 4-23, 5-11, 6-7, 6-12, C-9 CONFIG .............................................................................................................................................. 9-10, C-28 consistency ................................................................................................................................................... 13-9 Context.......................................................................................................2-15, 4-5, 4-9, 5-15, 5-16, 5-17, 5-18 contexts........................................................................................................................................................... 6-3 ConvertFmt ..........................................................................................D-2, D-16, D-17, D-18, D-19, D-23, D-24 COP0 ................... 2-7, 2-11, 2-12, 2-13, 2-15, 3-2, 3-20, 4-1, 4-5, 4-16, 4-17, 4-22, 4-28, 5-23, 6-1, 6-3, 6-14, 8-25, 9-2, 9-3, 9-11, A-4, A-141, A-142, B-163, C-1, C-7, C-9, C-10, C-11, C-12, C-14, C-15, C-17, C-18, C-19, C-20, C-21, C-22, C-23, C-24, C-25, C-26, C-27, C-28, C-29, C-30, C-31, C-32, C-33, C-34, C-35, C-36, C-41, C-42, D-40 COP1 ................... 2-3, 2-4, 2-7, 2-8, 2-10, 2-11, 2-12, 2-13, 2-14, 3-2, 3-21, 4-29, 9-6, 9-7, A-8, A-125, A-141, A-142, B-163, C-16, C-41, D-1, D-2, D-27, D-29, D-40, D-41 coprocessor ......... 2-4, 2-7, 2-8, 2-16, 3-5, 3-21, 4-16, 4-17, 5-11, 5-23, 6-1, 10-2, A-4, A-5, A-142, C-1, C-2, C-3, C-4, C-5, C-14, C-15, C-18, C-28, D-1, D-14, D-15, D-21, D-26 Coprocessor ........ 1-1, 1-5, 2-11, 2-15, 3-2, 3-5, 3-16, 3-20, 3-21, 4-1, 4-5, 4-16, 4-19, 4-20, 5-2, 5-8, 5-9, 5-10, 5-23, 6-1, 6-14, 8-10, 8-11, 13-2, A-3, A-4, A-5, A-8, A-141, A-142, C-1, C-2, C-3, C-4, C-5, C-7, C-16, C-17, C-18, C-19, C-20, C-21, C-22, C-23, C-24, C-25, C-26, C-27, C-28, C-29, C-30, C-31, C-32, C-33, C-34, C-35, C-36, C-37, C-38, C-39, C-40, D-4, D-5,
X-3
Index
D-6, D-7, D-11, D-12, D-13, D-14, D-15, D-16, D-17, D-18, D-19, D-20, D-21, D-22, D-23, D-24, D-25, D-26, D-27, D-28, D-29, D-30, D-31, D-32, D-33, D-34, D-35, D-36, D-37, D-38, D-39 Coprocessor0 ............................................................................................................................................... 13-4 Count .................................................................................................2-15, 3-25, 4-5, 4-13, 4-15, 5-24, B-4, B-5 counter ................. 2-15, 2-16, 2-19, 3-17, 4-5, 4-17, 4-18, 4-19, 4-28, 4-30, 4-33, 5-5, 5-9, 5-13, 6-1, 9-1, 9-2, 9-3, 9-5, 9-6, 9-8, 9-10, 9-11, C-28, C-35 Counter ................ 2-3, 2-15, 2-19, 3-20, 4-1, 4-2, 4-3, 4-4, 4-5, 4-19, 4-21, 4-28, 4-29, 4-30, 5-2, 5-7, 5-8, 5-9, 5-10, 5-11, 5-13, 9-1, 9-2, 9-3, 9-4, 9-5, 9-6, 9-10, 9-11, 12-6, A-4, C-25, C-26, C-35 CPCOND ........................................................................................................................................................A-3 CPCOND0 ............................................................................................................8-10, 8-11, C-2, C-3, C-4, C-5 CPR ..................... A-3, C-17, C-18, C-19, C-20, C-21, C-22, C-23, C-24, C-25, C-26, C-27, C-28, C-29, C-30, C-31, C-32, C-33, C-34, C-35, C-36 CPUADDR ........................................................................................................................................8-3, 8-7, 8-9 CPUASTART ....................................................................................... 8-3, 8-7, 8-8, 8-9, 8-12, 8-13, 8-16, 8-19 CPUBE..............................................................................................................................................8-3, 8-7, 8-9 CPUCLK ........................................................................................................................................................8-11 CPUDATA ...................................................................................................................... 8-3, 8-7, 8-9, 8-17, 8-20 CPUDSTART ............................................................... 8-3, 8-10, 8-12, 8-13, 8-16, 8-17, 8-19, 8-20, 8-26, 8-28 CPURD .............................................................................................................................................8-3, 8-8, 8-9 CPUTRANSTYPE........................................................................................................................................... 8-8 CPUTSIZE .......................................................................................................... 8-3, 8-9, 8-12, 8-13, 8-16, 8-19 CPUWR ............................................................................................................................................8-3, 8-8, 8-9 CTC1......................................................................................... 3-21, 10-7, 10-8, 10-9, 10-13, 11-9, D-15, D-40 CTE..................................................................................................... 4-28, 4-29, 5-11, 9-2, 9-4, 9-5, 9-10, 9-11 CTR0........................................................................................................................................... 4-29, 9-10, 9-11 CTR1........................................................................................................................................... 4-29, 9-10, 9-11 CU........................................................................................... 1-5, 3-5, 3-20, 3-21, 4-16, 4-17, C-1, C-14, C-15 CU0....................................................................................................................................................... 5-23, C-7 CVT............................................................................................................................................................... 3-26 CVT.D............................................................................................................................................................D-16 CVT.D.fmt ................................................................................................................................ 3-21, 10-14, D-41 CVT.L. ...........................................................................................................................................................D-17 CVT.L.fmt ................................................................................................................................. 3-21, 10-14, D-41 CVT.S............................................................................................................................................................D-18 CVT.S.fmt................................................................................................................................. 3-21, 10-14, D-41 CVT.W.fmt................................................................................................................................ 3-21, 10-14, D-41 CVT.W.S .......................................................................................................................................................D-19
D
DAB...........................................................................................................4-27, 13-3, 13-7, 13-12, 13-16, 13-19
X-4
Index
DABM........................................................................................................4-27, 13-3, 13-7, 13-16, 13-18, 13-19 DADD..............................................................................................................................3-15, 5-26, A-34, A-141 DADDI............................................................................................. 3-14, 5-26, A-35, A-141, B-163, C-41, D-40 DADDIU ..........................................................................................3-14, A-35, A-36, A-141, B-163, C-41, D-40 DADDU .......................................................................................................................... 3-15, A-34, A-37, A-141 DBE...............................................................................................................................................4-20, 5-8, 5-19 DC................................................................................................................................................................. 4-23 DCE ............................................................................................................................ 4-23, 5-11, 9-7, C-9, C-28 DDIV ...........................................................................................................3-4, 3-14, A-142, B-165, C-42, D-41 DDIVU.........................................................................................................3-4, 3-14, A-142, B-165, C-42, D-41 debug .................................................................................. 3-20, 4-17, 4-18, 4-19, 4-26, 4-33, 5-10, 5-14, 13-6 DEBUG ......................................................................................................................................................... 5-14 DEC ................................................................................................................................................................ 3-6 decoupling....................................................................................................................................................... 2-4 Demultiplexed ........................................................................................................................................ 2-18, 8-2 DEV................................................................................................ 4-16, 4-17, 5-7, 5-13, 5-14, 5-25, 9-10, 12-6 DHIN ...............................................................................................................................................................C-6 DHWBIN .........................................................................................................................................................C-6 DHWOIN.........................................................................................................................................................C-6 DI ................................................................................................. 3-20, 4-16, 4-17, 5-23, C-1, C-14, C-15, C-42 DIE .............................................................................................................................................. 4-23, 4-24, 5-11 dirty ........................................................................................................ 4-8, 5-18, 6-16, 8-12, A-91, C-11, C-12 Dirty........................................................................................................ 4-8, 4-32, 5-11, 6-16, C-11, C-12, C-13 dispatches..................................................................................................................................................... 3-17 displacement............................................................................................................................................3-3, A-9 DIV ........................................................................................... 2-18, 3-16, 3-26, A-38, A-40, A-80, A-141, D-20 DIV.fmt ..................................................................................................................................... 3-21, 10-14, D-41 DIV1 ..................................................................................................2-14, 3-23, 3-26, 4-2, B-3, B-7, B-9, B-163 Divide ........................................................ 1-1, 2-6, 3-14, 3-16, 3-21, 3-22, 3-23, 3-24, 3-26, 4-1, B-3, B-5, B-8 DIVU ...............................................................................................................................3-16, 3-26, A-40, A-141 DIVU1 ...................................................................................................... 2-14, 3-23, 3-26, 4-2, B-3, B-9, B-163 DKE............................................................................................................................. 13-6, 13-16, 13-18, 13-19 DMA ................................................................................... 8-1, 8-3, 8-6, 8-7, 8-10, 8-12, 8-13, 8-14, 8-25, 8-26 DMAC ...............................................................................................8-1, 8-3, 8-10, 8-11, 8-13, 8-14, 8-25, 8-26 DMFC1........................................................................................................................... 3-21, 10-13, D-21, D-40 DMTC1........................................................................................................................... 3-21, 10-13, D-22, D-40 DMULT........................................................................................................ 3-4, 3-14, A-142, B-165, C-42, D-41 DMULTU ..................................................................................................... 3-4, 3-14, A-142, B-165, C-42, D-41 doubleword .......... 3-5, 3-8, 3-9, 5-15, A-4, A-5, A-6, A-34, A-37, A-41, A-42, A-43, A-44, A-45, A-46, A-47, A-48, A-49, A-50, A-51, A-58, A-59, A-60, A-63, A-64, A-72, A-94, A-95, A-96, A-99, A-100,
X-5
Index
A-118, A-122, B-2, B-64, B-65, B-72, B-74, B-78, B-79, B-80, B-81, B-82, B-83, B-89, B-93, B-95, B-113, B-120, B-122, B-128, B-129, B-130 DRB ........................................................................................................................................13-6, 13-16, 13-18 DRE .................................................................................................5-11, 13-5, 13-6, 13-8, 13-16, 13-18, 13-19 DSE.........................................................................................................................................13-6, 13-16, 13-18 DSLL ........................................................................................................................................ 3-15, A-41, A-141 DSLL32 .................................................................................................................................... 3-15, A-42, A-141 DSLLV...................................................................................................................................... 3-15, A-43, A-141 DSRA ....................................................................................................................................... 3-15, A-44, A-141 DSRA32 ................................................................................................................................... 3-15, A-45, A-141 DSRAV..................................................................................................................................... 3-15, A-46, A-141 DSRL ....................................................................................................................................... 3-15, A-47, A-141 DSRL32 ................................................................................................................................... 3-15, A-48, A-141 DSRLV ..................................................................................................................................... 3-15, A-49, A-141 DSUB ..............................................................................................................................3-15, 5-26, A-50, A-141 DSUBU .......................................................................................................................... 3-15, A-50, A-51, A-141 DTE............................................................................................................................. 13-6, 13-16, 13-18, 13-20 DTLB....................................................................................................................... 2-3, 2-6, 2-16, 4-29, 9-6, 9-8 DUE ........................................................................................................................................13-6, 13-16, 13-18 DVB................................................................................................................................. 4-27, 13-3, 13-8, 13-12 DVBM.............................................................................................................................. 4-27, 13-3, 13-8, 13-18 DVE............................................................................................................................. 13-5, 13-16, 13-18, 13-19 DWB........................................................................................................................................13-6, 13-16, 13-18 DWE............................................................................................................ 5-11, 13-5, 13-6, 13-8, 13-16, 13-18 DXE............................................................................................................................. 13-6, 13-16, 13-18, 13-19 DXIN ...............................................................................................................................................................C-6 DXLDT ............................................................................................................................................................C-6 DXLTG ............................................................................................................................................................C-6 DXSDT............................................................................................................................................................C-6 DXSTG ...........................................................................................................................................................C-6 DXWBIN .........................................................................................................................................................C-6
E
EC ................................................................................................................................................................. 4-23 EDI .................................................................................................................. 4-16, 4-17, 5-23, C-1, C-14, C-15 Edian............................................................................................................................................................. 4-23 EI.................................................................................................. 3-20, 4-16, 4-17, 5-23, C-1, C-14, C-15, C-42 EIE .................................................................................................................4-16, 4-17, 4-18, 5-24, C-14, C-15 endian .................. 3-5, 3-6, 3-7, 3-9, 3-10, 3-11, 3-12, 3-13, A-3, A-6, A-61, A-62, A-65, A-66, A-73, A-74, A-77, A-78, A-97, A-98, A-101, A-102, A-119, A-120, A-123, A-124 endianess ....................................................................................................................................................... 3-9
X-6
Index
Endianness .............................................................................................................................................. 1-2, 3-5 EntryHi .................... 2-15, 4-5, 4-14, 5-15, 5-16, 5-17, 5-18, 6-2, 6-3, 6-4, 6-15, C-28, C-37, C-38, C-39, C-40 EntryHI .......................................................................................................................................................... 6-16 EntryHi7 ........................................................................................................................................................C-37 EntryLo........................................................................................5-15, 5-16, 5-17, 5-18, 6-15, C-38, C-39, C-40 EntryLo0................................................................................ 2-15, 4-5, 4-8, 5-16, 6-15, 6-16, C-38, C-39, C-40 EntryLo1................................................................................ 2-15, 4-5, 4-8, 5-16, 6-15, 6-16, C-38, C-39, C-40 EPC...................... 2-6, 2-15, 4-5, 4-21, 4-33, 5-2, 5-3, 5-15, 5-16, 5-17, 5-18, 5-19, 5-20, 5-21, 5-22, 5-23, 5-26, 5-27, 11-9, C-16 ERET ............2-11, 2-12, 2-13, 3-20, 4-4, 5-5, 5-24, 6-11, 9-7, 9-11, 12-2, 12-5, C-16, C-38, C-39, C-40, C-42 ERL ...................... 4-16, 4-17, 4-18, 5-5, 5-9, 5-11, 5-12, 5-13, 5-14, 5-19, 5-24, 5-25, 6-6, 6-7, 6-8, 6-9, 6-10, 6-11, 6-12, 9-2, 9-10, 9-11, 13-5, 13-6, C-14, C-15, C-16 ERL0 ............................................................................................................................................................... 9-5 ERL1 ............................................................................................................................................................... 9-5 Error ..................... 2-6, 2-15, 4-5, 4-12, 4-17, 4-18, 5-2, 5-10, 5-15, 5-19, 5-23, 6-6, 6-7, 6-9, 8-13, 8-25, 8-26, 8-28, A-2, A-54, A-55, A-56, A-57, A-58, A-62, A-66, A-67, A-68, A-70, A-74, A-78, A-79, A-93, A-94, A-98, A-102, A-103, A-116, A-120, A-124, B-10, B-162, C-7, C-8, D-26, D-34, D-37 ErrorEPC...............................................................................4-33, 5-5, 5-12, 5-13, 5-14, 5-25, 9-10, 9-11, C-16 ErrorPC .................................................................................................................................................. 2-15, 4-5 EVENT ............................................................................................................................................................ 9-5 EVENT0 ................................................................................................................ 4-28, 4-29, 9-2, 9-5, 9-6, 9-11 EVENT1 ........................................................................................................................4-28, 4-29, 9-5, 9-6, 9-11 EXC2....................................................................................... 4-19, 5-5, 5-8, 5-11, 5-12, 5-13, 5-14, 5-25, 9-10 ExcCode ................ 4-19, 4-20, 5-2, 5-8, 5-15, 5-16, 5-17, 5-18, 5-19, 5-20, 5-21, 5-22, 5-23, 5-24, 5-26, 5-27 exception.............. 2-15, 2-16, 2-18, 2-19, 3-2, 3-5, 3-16, 3-18, 3-20, 4-4, 4-5, 4-9, 4-12, 4-14, 4-16, 4-17, 4-18, 4-19, 4-20, 4-21, 4-29, 4-33, 5-1, 5-2, 5-3, 5-5, 5-8, 5-9, 5-10, 5-11, 5-12, 5-13, 5-14, 5-15, 5-16, 5-17, 5-18, 5-19, 5-20, 5-21, 5-22, 5-23, 5-24, 5-25, 5-26, 5-27, 6-1, 6-2, 6-4, 6-6, 6-9, 6-11, 6-14, 6-15, 6-16, 6-17, 6-20, 8-13, 8-25, 9-2, 9-7, 9-8, 9-10, 9-11, 10-8, 11-2, 11-3, 12-1, 12-2, 12-3, 12-5, 12-6, 12-7, 12-14, 12-15, 12-16, 12-17, 12-18, 12-19, 12-20, 13-2, 13-4, 13-5, 13-6, 13-8, 13-9, 13-14, 13-15, 13-16, 13-18, 13-19, 13-20, A-2, A-6, A-8, A-11, A-12, A-13, A-14, A-20, A-21, A-28, A-29, A-33, A-34, A-35, A-36, A-37, A-38, A-39, A-40, A-50, A-51, A-54, A-55, A-58, A-67, A-68, A-70, A-86, A-87, A-91, A-92, A-94, A-103, A-106, A-107, A-108, A-109, A-114, A-115, A-116, A-126, A-127, A-128, A-129, A-130, A-131, A-132, A-133, A-134, A-135, A-136, A-137, A-138, A-142, B-7, B-8, B-9, B-11, B-12, B-13, B-14, B-20, B-21, B-22, B-23, B-25, B-27, B-28, B-66, B-67, B-68, B-70, B-71, B-84, B-86, B-91, B-93, B-95, B-111, B-113, B-118, B-120, B-122, B-165, C-1, C-2, C-3, C-4, C-5, C-7, C-8, C-16, C-17, C-18, C-19, C-20, C-21, C-22, C-23, C-24, C-25, C-26, C-27, C-28, C-29, C-30, C-31, C-32, C-33, C-34, C-35, C-36, C-37, C-38, C-39, C-40, C-42, D-26, D-37, D-41 Exception ............. 2-6, 2-11, 2-15, 2-19, 3-18, 3-20, 3-21, 4-5, 4-18, 4-20, 4-21, 5-1, 5-2, 5-3, 5-4, 5-5, 5-6, 5-7,
X-7
Index
5-8, 5-9, 5-10, 5-11, 5-12, 5-13, 5-14, 5-15, 5-16, 5-17, 5-18, 5-19, 5-20, 5-21, 5-22, 5-23, 5-24, 5-25, 5-26, 5-27, 5-28, 6-6, 6-11, 8-25, 8-26, 12-2, 12-5, 12-6, 12-7, 12-14, 12-15, 12-16, 12-17, 12-18, 13-2, 13-6, A-8, A-37, A-79, B-62, C-8 Exceptions .....................................................................................................................................................11-5 execution pipeline ..................................................................................... 2-3, 2-5, 2-10, 2-11, 2-12, 3-26, C-16 ExHnd ............................................................................................................ 12-14, 12-15, 12-16, 12-17, 12-18 ExHnd1 ............................................................................................................................................ 12-19, 12-20 ExHnd2 ............................................................................................................................................ 12-19, 12-20 EXL ...................... 4-16, 4-17, 4-18, 4-21, 4-29, 5-2, 5-5, 5-7, 5-9, 5-12, 5-16, 5-19, 5-24, 6-6, 6-8, 6-9, 6-10, 6-11, 6-12, 9-2, 12-6, 13-5, 13-6, C-14, C-15, C-16 EXL0 ......................................................................................................................................4-29, 9-2, 9-5, 9-11 EXL1 ............................................................................................................................................. 4-29, 9-5, 9-11
F
FCR...............................................................................................................................................................D-14 FCR0............................................................................................................................................................. 10-4 FCR31........................................................................................................................................ 10-4, 10-6, D-15 FCRs............................................................................................................................................................. 10-4 FetchAddress...................................................................................................................................... C-10, C-11 FGR ............................................................................................................................................................ 10-13 FGRs............................................................................................................................................................. 10-2 FLOOR.L.......................................................................................................................................................D-23 FLOOR.L.fmt ........................................................................................................................... 3-21, 10-14, D-41 FLOOR.W. ....................................................................................................................................................D-24 FLOOR.W.fmt .......................................................................................................................... 3-21, 10-14, D-41 FP_Control..........................................................................................................................................D-14, D-15 FPE ......................................................................................................................................4-20, 5-8, 5-28, 11-3 FPR...................... 2-3, 2-9, D-2, D-4, D-5, D-8, D-12, D-13, D-16, D-17, D-18, D-19, D-20, D-21, D-22, D-23, D-24, D-26, D-27, D-28, D-30, D-31, D-32, D-33, D-35, D-36, D-37, D-38, D-39 FPRs ......................................................................................................................10-2, D-10, D-16, D-17, D-28 FPU...................... 1-2, 2-3, 2-7, 2-8, 2-14, 2-18, 4-16, 10-13, 10-14, 11-2, 11-5, 11-8, D-1, D-2, D-3, D-14, D-15, D-27, D-29 FR ...............................................................................................................................................4-16, 4-17, 10-2 funnel shift ..................................................................... 2-3, 2-14, 4-1, 4-2, 4-4, B-17, B-20, B-21, B-22, B-161 Funnel shift ....................................................................................................................................................2-11
G
gathering ............................................................................................................ 2-4, 2-19, 6-17, 9-1, A-8, A-125 General Purpose Registers ........................................................................................2-3, 4-1, 4-2, 4-3, 4-4, A-3 global bit........................................................................................................................................................ 6-18 GPR ..............................................................................................................................................................D-21 GPR10 ................................................................................................................................................ B-21, B-22
X-8
Index
GPRLEN ......................................................................................................................................... A-3, D-6, D-7
H
HI ......................... 2-11, 2-14, 3-16, 3-22, 3-23, 3-24, 3-26, 4-1, 4-2, 4-3, 4-4, A-38, A-39, A-40, A-80, A-84, A-86, A-87, B-2, B-5, B-11, B-13, B-23, B-25, B-66, B-67, B-68, B-70, B-84, B-85, B-86, B-87, B-91, B-92, B-93, B-95, B-101, B-102, B-111, B-113, B-115, B-116, B-118, B-120, B-122 HI0 ............................................................................................................................................ 4-2, 4-3, 4-4, B-2 HI1 ................................. 2-11, 2-14, 4-2, 4-3, 4-4, B-2, B-3, B-7, B-8, B-9, B-12, B-14, B-15, B-18, B-24, B-26 hit under miss ........................................................................................................................................ 1-2, 4-23
I
IAB ...................................................................................................4-27, 13-3, 13-6, 13-7, 13-11, 13-13, 13-14 IABM ............................................................................................................................... 4-27, 13-3, 13-7, 13-14 IAE .................................................................................................................................5-11, 13-5, 13-14, 13-15 IBE ................................................................................................................................................4-20, 5-8, 5-19 IC .................................................................................................................................................................. 4-23 ICE ............................................................................................................................................... 4-23, 5-11, C-9 ID ......................................................................................................................................................... 4-14, 6-16 IE................................................................................................... 4-16, 4-17, 4-18, 5-9, 5-12, 5-24, C-14, C-15 IEEE............................2-18, 10-1, 10-8, 10-9, 10-10, 11-2, 11-3, 11-6, 11-7, 11-8, 11-9, D-8, D-12, D-13, D-19 IFL...................................................................................................................................................................C-6 IHIN.................................................................................................................................................................C-6 IKE ..................................................................................................................................................... 13-5, 13-14 IM ............................................................................................................................... 4-13, 4-16, 4-17, 4-18, 5-9 imprecise .............................................................................................5-14, 5-19, 8-13, 13-2, 13-5, 13-8, 13-20 Index ..................... 2-15, 3-20, 4-5, 4-6, 5-18, 5-19, 6-20, C-7, C-9, C-10, C-11, C-12, C-13, C-37, C-38, C-39 INDEX .............................................................................................................................................................C-6 Index5 .................................................................................................................................................C-38, C-39 Init ..................................................................................................................................................................9-11 initialize ..........................................................................................................................................................9-11 initializing .......................................................................................................................................................5-11 Initializing .......................................................................................................................................................9-11 INT ................................................................................................................................................................ 8-10 interleave ............................................................................................................................................ B-88, B-89 interleaved .......................................................................................................................................... B-88, B-89 interrupt........ 1-5, 3-16, 3-22, 4-13, 4-15, 4-16, 4-17, 4-19, 4-33, 5-24, 8-10, 8-13, 8-25, 8-26, 9-4, 13-8, C-16 Interrupt............... 3-20, 4-16, 4-17, 4-18, 4-19, 4-20, 5-2, 5-5, 5-7, 5-8, 5-9, 5-10, 5-12, 5-24, 8-10, 8-25, 12-6 Interrupts.............................................................................................................................................. 4-16, 4-18 INVALIDATE ...................................................................................................................................................C-6 ISE ..................................................................................................................................................... 13-5, 13-14 Issue ...................................................................................................................................................... 2-3, 2-12
X-9
Index
issues.................................................................................................................................. 2-3, 4-24, 8-12, 13-9 ITE ..........................................................................................................................................13-6, 13-14, 13-20 ITLB ................................................................................................................................. 2-3, 2-6, 2-16, 9-6, 9-8 IUE ..........................................................................................................................................13-5, 13-14, 13-15 IV......................................................................1-1, 1-2, 1-3, 2-16, 3-2, 3-4, 3-19, 6-1, A-82, A-83, A-91, A-141 IXE ..................................................................................................................................................... 13-5, 13-14 IXIN .................................................................................................................................................................C-6 IXLDT..............................................................................................................................................................C-6 IXLTG..............................................................................................................................................................C-6 IXSDT .............................................................................................................................................................C-6 IXSTG .............................................................................................................................................................C-6
J
J ........................... 3-3, 3-17, 9-7, 12-2, A-9, A-17, A-18, A-19, A-22, A-23, A-24, A-25, A-26, A-27, A-30, A-31, A-32, A-52, A-61, A-62, A-65, A-66, A-73, A-74, A-77, A-78, A-141, B-163, C-41, D-6, D-7, D-40 JAL.................................................... 3-17, 9-7, 12-2, A-20, A-21, A-28, A-29, A-53, A-141, B-163, C-41, D-40 JALR ....................................................................... 3-17, 9-7, 12-2, 12-5, A-20, A-21, A-28, A-29, A-54, A-141 JMPA.................................................................................................................................................... 12-3, 12-4 JMPB ................................................................................................................................................... 12-3, 12-4 JR......................... 3-17, 9-7, 12-2, 12-5, A-17, A-18, A-19, A-22, A-23, A-24, A-25, A-26, A-27, A-30, A-31, A-32, A-55, A-141, D-6, D-7 JTLB......................................................................................................................................................... 9-6, 9-8
K
K0.....................................................................................4-23, 4-24, 4-29, 6-7, 6-12, 9-2, 9-5, 9-10, 9-11, C-28 KB ........................ 6-2, 6-5, A-17, A-18, A-19, A-20, A-21, A-22, A-23, A-24, A-25, A-26, A-27, A-28, A-29, A-30, A-31, A-32 Kernel................... 2-16, 2-19, 3-20, 3-26, 4-16, 4-17, 4-18, 4-29, 5-2, 5-22, 5-23, 6-1, 6-6, 6-7, 6-10, 6-11, 6-12, 6-13, 9-2, 13-5, 13-6, C-1, C-7, C-14, C-15 kseg0 .........................................................................................................................4-24, 6-7, 6-12, 9-10, C-28 kseg1 ..................................................................................................................................................... 6-7, 6-12 kseg3 .................................................................................................................... 2-16, 4-9, 6-1, 6-7, 6-12, 6-13 ksseg...................................................................................................................................................... 6-7, 6-12 KSU....................................................... 4-16, 4-17, 4-18, 5-2, 6-6, 6-8, 6-9, 6-10, 6-11, 6-12, 6-13, C-14, C-15 kuseg .....................................................................................................................................2-16, 6-1, 6-7, 6-12
L
LB...................................................................................................... 3-4, 13-8, A-56, A-141, B-163, C-41, D-40 LBU ............................................................................................................ 3-4, A-57, A-141, B-163, C-41, D-40 LD ..............................................................................................3-4, 13-8, A-5, A-58, A-141, B-163, C-41, D-40 LDC1............................................................................ 3-5, 3-21, 3-26, 10-13, A-141, B-163, C-41, D-25, D-40 LDL ..................................................................................3-4, 3-8, A-59, A-60, A-63, A-141, B-163, C-41, D-40
X-10
Index
LDR..................................................................................3-4, 3-8, A-59, A-63, A-64, A-141, B-163, C-41, D-40 LH ..........................................................................................3-4, 13-8, A-67, A-141, B-102, B-163, C-41, D-40 LHU............................................................................................................ 3-4, A-68, A-141, B-163, C-41, D-40 li ..................................................................................................................... 13-14, 13-15, 13-16, 13-18, 13-19 Link ......................................................................................................................................2-11, 3-17, 3-18, 4-4 LL ..................................................................................................................1-2, 3-4, A-142, B-165, C-42, D-41 LLD ...............................................................................................................1-2, 3-4, A-142, B-165, C-42, D-41 LO ........................ 2-11, 2-14, 3-16, 3-22, 3-23, 3-24, 3-26, 4-1, 4-2, 4-3, 4-4, A-38, A-39, A-40, A-81, A-85, A-86, A-87, B-2, B-5, B-11, B-13, B-23, B-25, B-66, B-67, B-68, B-70, B-84, B-85, B-86, B-87, B-91, B-92, B-93, B-95, B-102, B-106, B-111, B-113, B-116, B-117, B-118, B-120, B-122 LO0 ..................................................................................................................................4-2, 4-3, 4-4, 6-16, B-2 LO1 ....................... 2-11, 2-14, 4-2, 4-3, 4-4, 6-16, B-2, B-3, B-7, B-8, B-9, B-12, B-14, B-16, B-19, B-24, B-26 LoadMemory...............................A-6, A-56, A-57, A-58, A-60, A-64, A-67, A-68, A-70, A-72, A-76, A-79, B-10 Lock ...............................................................................................................2-17, 4-32, 5-11, C-11, C-12, C-13 Locking.......................................................................................................................................................... 2-17 logical pipe ..................................................................................................................................2-10, 2-12, 2-13 LQ .................................................................................... 3-5, 3-25, 13-8, A-141, B-4, B-10, B-163, C-41, D-40 LRF ....................................................................................................... 4-32, 5-11, C-9, C-10, C-11, C-12, C-13 LUI .................................................................................................. 3-14, 3-26, A-69, A-141, B-163, C-41, D-40 LW ................................................................................3-4, A-5, A-70, A-141, B-102, B-116, B-163, C-41, D-40 LWC1 ............................................................................3-5, 3-21, 3-26, 10-13, A-141, B-163, C-41, D-26, D-40 LWC2 .......................................................................................................................... A-142, B-165, C-42, D-41 LWL........................................................................ 3-4, 3-8, A-71, A-72, A-75, A-76, A-141, B-163, C-41, D-40 LWR ....................................................................... 3-4, 3-8, A-71, A-72, A-75, A-76, A-141, B-163, C-41, D-40 LWU ............................................................................................................3-4, A-79, A-141, B-163, C-41, D-40 LZC .............................................................................................................................................. 2-13, B-4, B-90
M
MAC ............................................................................................................................................ 2-11, 3-16, 3-22 MAC0 .......................................................................................................................................... 2-11, 2-12, 2-13 MAC1 .......................................................................................................................................... 2-11, 2-12, 2-13 MADD ............................................................................................................3-23, 3-26, B-3, B-11, B-13, B-163 MADD1 .........................................................................................2-14, 3-23, 3-26, 4-2, B-3, B-12, B-14, B-163 MADDU................................................................................................................... 3-23, 3-26, B-3, B-13, B-163 MADDU1................................................................................................ 2-14, 3-23, 3-26, 4-2, B-3, B-14, B-163 Mask .................... 2-15, 2-19, 3-20, 4-5, 4-10, 4-16, 4-17, 4-27, 5-9, 5-24, 6-15, 13-3, 13-4, 13-7, 13-8, C-20, C-22, C-24, C-30, C-32, C-34, C-39, C-40 MASK................................................................................................................................................... 4-10, 6-16 Maskable................................................................................................................................................ 5-8, 5-12 MAX .............................................................................................................................................................. 2-18
X-11
Index
MB...................................................................................................................... 6-2, 6-5, 6-12, 6-13, A-52, A-53 MF0...............................................................................................................................................................C-41 MFBPC ............................................................................................................................ 3-20, 13-4, C-17, C-41 MFC0 ................................................................................................................. 3-20, 4-1, 9-3, 13-2, 13-4, C-18 MFC1 ............................................................................................................................. 3-21, 10-13, D-27, D-40 MFDAB ............................................................................................................................ 3-20, 13-4, C-19, C-41 MFDABM ......................................................................................................................... 3-20, 13-4, C-20, C-41 MFDVB ............................................................................................................................ 3-20, 13-4, C-21, C-41 MFDVBM ......................................................................................................................... 3-20, 13-4, C-22, C-41 MFHI ..................................................................................................................... 2-11, 3-16, A-80, A-81, A-141 MFHI1 .....................................................................................................2-11, 2-14, 3-23, 4-2, B-3, B-15, B-163 MFIAB .............................................................................................................................. 3-20, 13-4, C-23, C-41 MFIABM ........................................................................................................................... 3-20, 13-4, C-24, C-41 MFLO ..............................................................................................................................3-16, 3-23, A-81, A-141 MFLO1 .............................................................................................................2-14, 3-23, 4-2, B-3, B-16, B-163 MFPC.......................................................................................................................... 3-20, 9-2, 9-3, C-25, C-41 MFPS .......................................................................................................................... 3-20, 9-2, 9-3, C-26, C-41 MFSA .................................................................................................. 3-25, A-141, B-5, B-17, B-20, B-21, B-22 MIN ............................................................................................................................................................... 2-18 Misaligned....................................................................................................................................................... 3-8 misalignment...................................................................................................................................................C-8 mispredicted ............................................................................................................................................ 9-6, 9-7 Miss................................................................................................................2-17, 4-17, 6-4, 8-8, 9-7, 9-8, 12-6 misses.............................................................................................................................................1-1, 6-17, 9-9 MMI .............................................................................................5-22, A-141, B-163, B-164, B-165, C-41, D-40 MMI0 ............................................................................................................................................... B-163, B-164 MMI1 ............................................................................................................................................... B-163, B-164 MMI2 ............................................................................................................................................... B-163, B-165 MMI3 ............................................................................................................................................... B-163, B-165 MMU ..................................................................................................................... 2-3, 2-15, 2-16, 4-5, 6-1, 6-14 mod ......................................................................................................... A-38, A-40, B-7, B-9, B-66, B-68, B-70 MOV.....................................................................................................................................................11-6, D-28 MOV. fmt ....................................................................................................................................................... 10-8 MOV.fmt ................................................................................................................................... 3-21, 10-14, D-41 Move1 ............................................................................................................................................................2-11 MOVN ...................................................................................................................................... 3-19, A-82, A-141 MOVZ....................................................................................................................................... 3-19, A-83, A-141 MT0...............................................................................................................................................................C-41 MTBPC ......................................................................................................3-20, 13-4, 13-16, 13-19, C-27, C-41 MTC0 ................................................................................................................. 3-20, 4-1, 9-3, 13-2, 13-4, C-28
X-12
Index
MTC1 .................................................................................................................... 3-21, 3-26, 10-13, D-29, D-40 MTDAB ............................................................................................................................ 3-20, 13-4, C-29, C-41 MTDABM ......................................................................................................................... 3-20, 13-4, C-30, C-41 MTDVB ............................................................................................................................ 3-20, 13-4, C-31, C-41 MTDVBM ......................................................................................................................... 3-20, 13-4, C-32, C-41 MTHI ............................................................................................................................... 2-11, 3-16, A-84, A-141 MTHI1 .....................................................................................................2-11, 2-14, 3-23, 4-2, B-3, B-18, B-163 MTIAB .............................................................................................................................. 3-20, 13-4, C-33, C-41 MTIABM ........................................................................................................................... 3-20, 13-4, C-34, C-41 MTLO ....................................................................................................................................... 3-16, A-85, A-141 MTLO1 .............................................................................................................2-14, 3-23, 4-2, B-3, B-19, B-163 MTPC.......................................................................................................................... 3-20, 9-2, 9-3, C-35, C-41 MTPS .......................................................................................................................... 3-20, 9-2, 9-3, C-36, C-41 MTSA ............................................................................................................ 2-13, 3-25, A-141, B-5, B-17, B-20 MTSAB......................................................................... 2-13, 3-25, A-141, A-142, B-5, B-20, B-21, B-22, B-161 MTSAH .................................................................................. 2-13, 3-25, A-141, A-142, B-5, B-20, B-22, B-161 MTSAx ..........................................................................................................................................................B-20 MUL .................................................................................................................................................... 2-18, D-30 MUL.fmt ............................................................................................................................................. 3-21, 10-14 MUL.mft ........................................................................................................................................................D-41 MULT ...................................................................... 3-16, 3-23, 3-26, A-80, A-86, A-87, A-141, B-3, B-23, B-25 MULT1 ..........................................................................................2-14, 3-23, 3-26, 4-2, B-3, B-24, B-26, B-163 Multi ................................................................................................................................................................ 1-2 Multimaster ............................................................................................................................................ 2-18, 8-2 multimedia.................................................................................................. 1-1, 1-2, 2-3, 2-6, 3-2, 3-4, 3-5, 3-23 Multimedia........................................................................... 2-3, 2-14, 3-5, 3-22, 3-23, 3-24, 3-26, 4-2, B-1, B-3 multiply................. 2-14, 3-2, 3-4, 3-16, 3-22, 3-23, 4-1, 4-2, 4-4, A-8, A-86, A-87, A-125, B-11, B-12, B-13, B-14, B-23, B-24, B-25, B-26, B-84, B-85, B-86, B-87, B-91, B-92, B-93, B-95, B-111, B-113, B-118, B-120, B-122, C-16, D-30 Multiply................ 1-1, 1-2, 2-3, 2-6, 2-9, 2-11, 3-2, 3-14, 3-16, 3-21, 3-22, 3-23, 3-24, 3-26, 4-1, B-1, B-3, B-5 MULTU................................................................................................. 3-16, 3-23, 3-26, A-87, A-141, B-3, B-25 MULTU1................................................................................................. 2-14, 3-23, 3-26, 4-2, B-3, B-26, B-163
N
NaN..................................................................................................... 10-11, 11-6, D-8, D-10, D-11, D-12, D-13 NaNs ............................................................................................................................................................. 2-18 NBE............................................................................................................................................ 4-23, 5-11, C-28 NEG ........................................................................................................................................... 2-18, 11-6, D-31 NEG.fmt ................................................................................................................................... 3-21, 10-14, D-41 Negate ..............................................................................................................3-21, 8-3, D-2, D-31, D-32, D-33 NMI .............................. 4-17, 4-18, 4-19, 4-33, 5-2, 5-5, 5-7, 5-8, 5-9, 5-10, 5-12, 8-10, 8-13, 9-11, 12-6, C-14
X-13
Index
nonmaskable ................................................................................................................................................ 4-33 NOR .....................................................................................................3-15, 3-25, A-3, A-88, A-141, B-4, B-124 Normalization .................................................................................................................................................. 2-9 NOT ...............................................................................................................6-2, 13-8, 13-20, A-3, A-88, B-124 NotWordValue...... A-11, A-12, A-13, A-14, A-38, A-40, A-86, A-87, A-110, A-111, A-112, A-113, A-114, A-115, B-7, B-9, B-11, B-12, B-13, B-14, B-23, B-24, B-25, B-26, B-68, B-70, B-93, B-95, B-113, B-120, B-122 NullifyCurrentInstruction ............................................A-8, A-18, A-21, A-22, A-24, A-26, A-29, A-30, A-32, C-5
O
Offset ....................................................................6-4, 6-5, A-62, A-66, A-74, A-78, A-98, A-102, A-120, A-124 opcode ...........................................................................................................................2-16, 3-9, 5-22, 6-1, A-2 OpCode................ 3-23, 3-24, 3-25, 6-20, 9-3, A-141, A-142, B-163, B-164, B-165, C-6, C-25, C-26, C-35, C-36, C-41, C-42, D-40, D-41 operand................................................................. 1-2, 3-14, 3-22, 3-23, A-104, B-1, B-3, D-1, D-4, D-31, D-35 Operand .......................................................................................................................2-4, 3-14, 3-15, 3-23, B-3 OR..................... 2-9, 3-14, 3-15, 3-25, A-3, A-88, A-89, A-90, A-139, A-140, A-141, B-4, B-124, B-125, B-160 ORI............................................................................................................3-14, A-90, A-141, B-163, C-41, D-40 Ov .................................................................................................................................................4-20, 5-8, 5-26 Overflow............... 2-9, 4-30, 5-2, 5-8, 5-26, A-11, A-12, A-13, A-14, A-34, A-35, A-36, A-37, A-50, A-51, A-106, A-107, A-108, A-109, A-114, B-31, B-35, B-37, B-39, B-42, B-44, B-144, B-148, B-150 OVERFLOW ................................................................................................................................................... 5-5 OVFL.......................................................................................................................... 4-28, 4-30, 9-2, 9-10, 9-11
P
P0EXEA ............................................................................................................................................... 12-3, 12-4 P0EXEB ............................................................................................................................................... 12-3, 12-4 P1EXEA ............................................................................................................................................... 12-3, 12-4 P1EXEB ............................................................................................................................................... 12-3, 12-4 PA ......................................................................................................................C-6, C-7, C-9, C-10, C-11, C-12 PABSH ............................................................................................................................. 3-24, B-4, B-27, B-164 PABSW ............................................................................................................................ 3-24, B-4, B-28, B-164 PADDB............................................................................................................................. 3-24, B-3, B-29, B-164 PADDH............................................................................................................................. 3-24, B-3, B-30, B-164 PADDSB .......................................................................................................................... 3-24, B-3, B-31, B-164 PADDSH .......................................................................................................................... 3-24, B-3, B-35, B-164 PADDSW ......................................................................................................................... 3-24, B-3, B-37, B-164 PADDUB .......................................................................................................................... 3-24, B-3, B-39, B-164 PADDUH .......................................................................................................................... 3-24, B-3, B-42, B-164 PADDUW ......................................................................................................................... 3-24, B-3, B-44, B-164 PADDW ............................................................................................................................ 3-24, B-3, B-46, B-164 PADSBH .......................................................................................................................... 3-24, B-3, B-47, B-164
X-14
Index
Page.................................................................................................................... 2-16, 4-8, 4-10, 6-16, 6-17, 9-7 PageMask ........................................................................... 2-15, 4-5, 4-10, 6-14, 6-15, 6-16, C-38, C-39, C-40 PAND ............................................................................................................................... 3-25, B-4, B-48, B-165 PC ........................ 1-2, 2-3, 2-6, 2-19, 3-16, 3-17, 3-18, 4-1, 4-3, 4-4, 5-12, 9-10, 12-1, 12-2, 12-3, 12-5, 12-7, 12-8, 12-9, 12-10, 12-11, 12-12, 12-13, 12-14, 12-15, 12-16, 12-17, 12-18, 12-19, 12-20, 13-7, A-4, A-9, A-17, A-18, A-19, A-20, A-21, A-22, A-23, A-24, A-25, A-26, A-27, A-28, A-29, A-30, A-31, A-32, A-52, A-53, A-54, A-55, C-2, C-3, C-4, C-5, C-16, D-6, D-7 PC tracing ........................................................................................................................... 1-2, 2-19, 12-1, 12-3 PCEQB ............................................................................................................................ 3-25, B-4, B-49, B-164 PCEQH ............................................................................................................................ 3-25, B-4, B-52, B-164 PCEQW ........................................................................................................................... 3-25, B-4, B-54, B-164 PCGTB............................................................................................................................. 3-25, B-4, B-56, B-164 PCGTH ............................................................................................................................ 3-25, B-4, B-59, B-164 PCGTW ........................................................................................................................... 3-25, B-4, B-61, B-164 PCPYH............................................................................................................................. 3-25, B-5, B-63, B-165 PCPYLD........................................................................................................................... 3-25, B-5, B-64, B-165 PCPYUD .......................................................................................................................... 3-25, B-5, B-65, B-165 PDIVBW........................................................................................................ 3-24, B-5, B-66, B-69, B-71, B-165 PDIVUW .......................................................................................................................... 3-24, B-5, B-68, B-165 PDIVW ............................................................................................................................. 3-24, B-5, B-70, B-165 Perf ........................................................................................................................................................ 2-15, 4-5 PerfC.............................................................................................................................................4-19, 5-8, 5-13 Performance ........ 1-2, 2-1, 2-15, 2-19, 3-20, 4-5, 4-17, 4-19, 4-28, 4-29, 4-30, 5-2, 5-5, 5-7, 5-8, 5-9, 5-10, 5-11, 5-13, 9-1, 9-2, 9-3, 9-4, 9-10, 12-6, C-25, C-26, C-35, C-36 performance monitor..................................................................................................................................... 3-20 PEXCH............................................................................................................................. 3-25, B-5, B-72, B-165 PEXCW............................................................................................................................ 3-25, B-5, B-73, B-165 PEXEH............................................................................................................................. 3-25, B-5, B-74, B-165 PEXEW ............................................................................................................................ 3-25, B-5, B-75, B-165 PEXT5.............................................................................................................................. 3-25, B-5, B-76, B-164 PEXTLB ........................................................................................................................... 3-25, B-5, B-78, B-164 PEXTLH ........................................................................................................................... 3-25, B-5, B-79, B-164 PEXTLW .......................................................................................................................... 3-25, B-5, B-80, B-164 PEXTUB........................................................................................................................... 3-25, B-5, B-81, B-164 PEXTUH .......................................................................................................................... 3-25, B-5, B-82, B-164 PEXTUW ......................................................................................................................... 3-25, B-5, B-83, B-164 PFN...................................................................................... 2-15, 4-5, 4-8, 6-16, C-10, C-11, C-12, C-39, C-40 PHMADH ......................................................................................................................... 3-24, B-5, B-84, B-165 PHMSBH.......................................................................................................................... 3-24, B-5, B-86, B-165 Physical................................................................2-10, 2-15, 2-16, 4-5, 4-25, 6-3, 6-4, 6-18, A-4, A-6, A-7, C-7
X-15
Index
PINTEH............................................................................................................................ 3-25, B-5, B-88, B-165 PINTH .............................................................................................................................. 3-25, B-5, B-89, B-165 PLZCW ............................................................................................................................ 3-25, B-4, B-90, B-163 PMADDH ............................................ 3-24, B-5, B-91, B-94, B-96, B-112, B-114, B-119, B-121, B-123, B-165 PMADDUW ...................................................................................................................... 3-24, B-5, B-93, B-165 PMADDW ........................................................................................................................ 3-24, B-5, B-95, B-165 PMAXH ............................................................................................................................ 3-24, B-4, B-97, B-164 PMAXW ........................................................................................................................... 3-24, B-4, B-99, B-164 PMFHI............................................................................................................................ 3-24, B-5, B-101, B-165 PMFHL........................................................................................................................... 3-24, B-5, B-102, B-163 PMFLO........................................................................................................................... 3-24, B-5, B-106, B-165 PMINH ........................................................................................................................... 3-24, B-4, B-107, B-164 PMINW .......................................................................................................................... 3-24, B-4, B-109, B-164 PMSUBH.........................................................................................................................3-24, B-5, B-111, B-165 PMSUBW........................................................................................................................3-24, B-5, B-113, B-165 PMTHI.............................................................................................................................3-24, B-5, B-115, B-165 PMTHL............................................................................................................................3-24, B-5, B-116, B-163 PMTLO............................................................................................................................3-24, B-5, B-117, B-165 PMULTH .........................................................................................................................3-24, B-5, B-118, B-165 PMULTUW ..................................................................................................................... 3-24, B-5, B-120, B-165 PMULTW ....................................................................................................................... 3-24, B-5, B-122, B-165 PNOR............................................................................................................................. 3-25, B-4, B-124, B-165 pointer ....................................................................................................................................................4-9, A-92 POR ............................................................................................................................... 3-25, B-4, B-125, B-165 PPAC5 ........................................................................................................................... 3-25, B-5, B-126, B-164 PPACB ........................................................................................................................... 3-25, B-5, B-128, B-164 PPACH........................................................................................................................... 3-25, B-5, B-129, B-164 PPACW .......................................................................................................................... 3-25, B-5, B-130, B-164 precise ............................................................................................................................................................ 9-4 prediction .................................................................................................................................1-2, 2-3, 4-23, 9-7 Prediction ...................................................................................................................................................... 4-23 PREF .......................................................................................3-19, 4-23, A-2, A-91, A-141, B-163, C-41, D-40 prefetch ...................................................................................................................................... 5-19, A-91, A-92 Prefetch.........................................................................................1-1, 1-2, 2-11, 2-17, 3-19, 8-8, 9-7, A-7, A-92 Prefix............................................................................................................................................................... 8-3 PREVH........................................................................................................................... 3-25, B-5, B-131, B-165 PRId ..............................................................................................................................................2-15, 4-5, 4-22 priorities ........................................................................................................................................................ 12-7 privilege.......................................................................................................................................... 9-5, 9-11, C-8 privilege mode ....................................................................................................................................... 9-5, 9-11
X-16
Index
Probe ......................................................................................................................... 3-20, 4-6, 4-14, 5-17, 6-20 PROT3W ....................................................................................................................... 3-25, B-5, B-132, B-165 Pseudo................................................................................................................................................... 2-15, 4-5 pseudocode .............................................................................................. A-1, A-2, A-3, A-4, A-6, A-8, B-2, D-2 Pseudocode ..................................................................................................................... A-3, A-4, A-6, B-2, D-2 PSLLH............................................................................................................................ 3-25, B-4, B-133, B-163 PSLLVW ........................................................................................................................ 3-25, B-4, B-134, B-165 PSLLW ........................................................................................................................... 3-25, B-4, B-135, B-163 PSRAH........................................................................................................................... 3-25, B-4, B-136, B-163 PSRAVW ....................................................................................................................... 3-25, B-4, B-137, B-165 PSRAW .......................................................................................................................... 3-25, B-4, B-138, B-163 PSRLH ........................................................................................................................... 3-25, B-4, B-139, B-163 PSRLVW ........................................................................................................................ 3-25, B-4, B-140, B-165 PSRLW .......................................................................................................................... 3-25, B-4, B-141, B-163 PSUBB........................................................................................................................... 3-24, B-3, B-142, B-164 PSUBH........................................................................................................................... 3-24, B-3, B-143, B-164 PSUBSB ........................................................................................................................ 3-24, B-3, B-144, B-164 PSUBSH ........................................................................................................................ 3-24, B-3, B-148, B-164 PSUBSW ....................................................................................................................... 3-24, B-3, B-150, B-164 PSUBUB ........................................................................................................................ 3-24, B-3, B-152, B-164 PSUBUH ........................................................................................................................ 3-24, B-3, B-155, B-164 PSUBUW ....................................................................................................................... 3-24, B-3, B-157, B-164 PSUBW.......................................................................................................................... 3-24, B-3, B-159, B-164 PTagLo................................................................................................................................................. 4-31, 4-32 PTE .................................................................................................................................................2-15, 4-5, 4-9 PTEBase......................................................................................................................................................... 4-9 PTEs ............................................................................................................................................................... 4-9 PXOR............................................................................................................................. 3-25, B-4, B-160, B-165
Q
QFSRV.............................................................................................. 3-25, B-5, B-20, B-21, B-22, B-161, B-164 qNaN..............................................................................................................................................................11-6 Quadword ...................................................................................... 1-2, 3-5, 3-8, 3-10, 3-12, 3-25, 8-9, B-4, B-5 QUADWORD .............................................................................................................................A-7, B-10, B-162 Quintibyte............................................................................................................................................. 3-10, 3-12 quotient .........................................................................................................................4-4, A-38, A-40, B-7, B-9
R
R10000 ........................................................................................................................................................... 1-3 R4000 ...................................................................................................................................................... 1-3, 6-2 random...................................................................................................................................2-15, 4-5, 4-11, 6-2 Random ................................................................2-15, 3-20, 4-5, 4-7, 4-11, 4-14, 5-11, 5-16, 5-17, 6-20, C-40
X-17
Index
Random5 ......................................................................................................................................................C-40 Refill ..................... 2-3, 2-17, 4-12, 4-14, 5-2, 5-7, 5-9, 5-16, 8-8, A-56, A-57, A-58, A-62, A-66, A-67, A-68, A-70, A-74, A-78, A-79, A-93, A-94, A-98, A-102, A-103, A-116, A-120, A-124, B-10, B-162, C-7, C-8, D-26, D-37 REGIMM ................................................................................................ 5-22, A-141, A-142, B-163, C-41, D-40 register ............................................................................................................. 10-2, 10-6, 11-2, 11-3, 11-8, 11-9 Register................ 2-5, 2-6, 2-8, 2-15, 3-14, 3-15, 3-17, 3-20, 3-25, 4-3, 4-4, 4-5, 4-6, 4-7, 4-8, 4-9, 4-10, 4-11, 4-12, 4-13, 4-14, 4-15, 4-16, 4-17, 4-18, 4-19, 4-21, 4-22, 4-23, 4-25, 4-26, 4-27, 4-28, 4-29, 4-30, 4-32, 4-33, 5-8, 6-9, 6-10, 6-12, 6-16, 8-25, 9-2, 9-3, 9-4, 9-10, 10-7, 10-8, 109, 13-2, 13-3, 13-4, 13-5, 13-7, 13-8, 13-9, A-3, A-4, A-5, A-9, A-54, B-3, B-5, B-161 registers ........................................................................................................................................................ 10-4 Registers.......2-1, 2-3, 2-14, 2-15, 3-17, 4-1, 4-2, 4-3, 4-4, 4-5, 4-8, 4-26, 4-28, 4-31, 6-14, 9-2, 9-3, 9-4, 13-3 REL ............................................................................................................................................. 8-11, 8-14, 8-15 Request........................................................................................................................................................... 9-9 Res......................................................................................................................................................... 4-19, 5-8 Reset.........................................................4-18, 4-19, 5-1, 5-2, 5-7, 5-8, 5-9, 5-10, 5-11, 8-11, 9-4, 12-6, 13-14 RESET ............................................................................................................................... 5-11, 5-12, 8-11, 8-14 RI ................................................................................................................................. 2-16, 4-20, 5-8, 5-22, 6-1 Root .............................................................................................................................................................. 3-21 Rotate ....................................................................................................................................................3-25, B-5 ROUND.L......................................................................................................................................................D-32 ROUND.L.fmt........................................................................................................................... 3-21, 10-14, D-41 ROUND.W ....................................................................................................................................................D-33 ROUND.W.fmt ......................................................................................................................... 3-21, 10-14, D-41 RSQRT ................................................................................................................................................ 2-18, 3-26
S
S0...........................................................................................................................................4-29, 9-2, 9-5, 9-11 S1.................................................................................................................................................. 4-29, 9-5, 9-11 sa ......................... 3-3, A-41, A-42, A-44, A-45, A-47, A-48, A-104, A-110, A-112, B-133, B-135, B-136, B-138, B-139, B-141 SA ......................................2-3, 2-11, 2-12, 2-13, 2-14, 3-25, 4-1, 4-2, 4-3, 4-4, B-17, B-20, B-21, B-22, B-161 Saturate ................................. B-34, B-36, B-38, B-41, B-43, B-45, B-147, B-149, B-151, B-154, B-156, B-158 saturation ........................B-3, B-31, B-35, B-37, B-39, B-42, B-44, B-144, B-148, B-150, B-152, B-155, B-157 Saturation...............................................................................................................................................3-24, B-3 SB .............................................................................................................. 3-4, A-93, A-141, B-163, C-41, D-40 SC .................................................................................................................1-2, 3-4, A-142, B-165, C-42, D-41 SCD ..............................................................................................................1-2, 3-4, A-142, B-165, C-42, D-41 SD ..............................................................................................3-4, 13-8, A-5, A-94, A-141, B-163, C-41, D-40 SDC1 .....................................................................................3-5, 3-21, 10-13, A-141, B-163, C-41, D-34, D-40 SDL ..................................................................................3-4, 3-8, A-95, A-96, A-99, A-141, B-163, C-41, D-40
X-18
Index
SDR ...............................................................................3-4, 3-8, A-95, A-99, A-100, A-141, B-163, C-41, D-40 segment .................................................................................................................. 2-16, 4-9, 6-1, 6-8, 6-9, 13-9 Segment........................................................................................................................................6-9, 6-10, 6-12 Semaphore ..................................................................................................................................................... 3-4 Septibyte .............................................................................................................................................. 3-10, 3-12 Serialization .................................................................................................................................................. 3-19 Sextibyte .............................................................................................................................................. 3-10, 3-12 SH .................................................................................................3-4, A-103, A-141, B-102, B-163, C-41, D-40 Shift..................................................................................... 2-3, 2-11, 3-14, 3-15, 3-25, 3-26, 4-2, 4-4, B-4, B-5 Shifter.............................................................................................................................................................. 2-3 shutdown......................................................................................................................................................... 6-2 sign ...................... 2-7, 2-9, 2-16, 3-4, 3-16, 3-17, 6-1, 6-3, 10-10, 10-11, 10-12, 13-8, A-11, A-12, A-13, A-14, A-17, A-18, A-19, A-20, A-21, A-22, A-23, A-24, A-25, A-26, A-27, A-28, A-29, A-30, A-31, A-32, A-35, A-36, A-38, A-39, A-40, A-44, A-45, A-46, A-56, A-57, A-58, A-60, A-64, A-67, A-68, A-69, A-70, A-71, A-72, A-74, A-75, A-76, A-78, A-79, A-86, A-87, A-92, A-93, A-94, A-96, A-99, A-100, A-103, A-104, A-105, A-107, A-108, A-110, A-111, A-112, A-113, A-114, A-115, A-116, A-117, A-118, A-121, A-122, A-128, A-130, A-131, A-134, A-135, A-138, B-7, B-9, B-10, B-11, B-12, B-13, B-14, B-23, B-24, B-25, B-26, B-68, B-70, B-93, B-95, B-113, B-120, B-122, B-136, B-137, B-138, B-140, B-162, C-2, C-3, C-4, C-5, C-6, D-2, D-14, D-27, D-31 Sign............................................................................................................................................................. 10-10 sign_extend.......... A-11, A-12, A-13, A-14, A-17, A-18, A-19, A-20, A-21, A-22, A-23, A-24, A-25, A-26, A-27, A-28, A-29, A-30, A-31, A-32, A-35, A-36, A-38, A-40, A-56, A-57, A-58, A-60, A-64, A-67, A-68, A-69, A-70, A-72, A-76, A-79, A-92, A-93, A-94, A-96, A-100, A-103, A-104, A-105, A-107, A-108, A-110, A-111, A-112, A-113, A-114, A-115, A-116, A-118, A-122, A-128, A-130, A-131, A-134, A-135, A-138, B-10, B-162, C-2, C-3, C-4, C-5, D-14, D-27 Signal ............................................................................................................................................... 8-3, 8-7, A-8 SignalException ... A-8, A-11, A-12, A-33, A-34, A-35, A-50, A-58, A-67, A-68, A-70, A-79, A-94, A-103, A-114, A-116, A-126, A-127, A-128, A-129, A-130, A-131, A-132, A-133, A-134, A-135, A-136, A-137, A-138 SIO........................................ 4-17, 4-18, 4-19, 4-33, 5-2, 5-5, 5-7, 5-8, 5-9, 5-10, 5-25, 8-10, 12-6, 13-8, C-14 SIOINT .......................................................................................................................................................... 8-10 SIOP .................................................................................................................................................... 4-19, 5-25 sll.................................................12-10, 12-11, 12-12, 12-13, 12-14, 12-15, 12-16, 12-17, 12-18, 12-19, 12-20 SLL......................................................................................................................3-15, A-74, A-78, A-104, A-141 SLLV ...................................................................................................................3-15, A-74, A-78, A-105, A-141 SLT......................................................................................................................3-15, A-82, A-83, A-106, A-141 SLTI..................................................................................... 3-14, A-82, A-83, A-107, A-141, B-163, C-41, D-40 SLTIU .................................................................................. 3-14, A-82, A-83, A-108, A-141, B-163, C-41, D-40 SLTU ...................................................................................................................3-15, A-82, A-83, A-109, A-141
X-19
Index
SLW ............................................................................................................................................................B-102 Snooping....................................................................................................................................................... 2-17 SPECIAL.................................................................................................... 5-22, A-9, A-141, B-163, C-41, D-40 SQ.................................................................................. 3-5, 3-25, 13-8, A-141, B-4, B-162, B-163, C-41, D-40 SQRT ......................................................................................................................................... 2-18, 3-26, D-35 SQRT.fmt ................................................................................................................................. 3-21, 10-14, D-41 Square .......................................................................................................................................................... 3-21 SquareRoot...................................................................................................................................................D-35 SR .......................................................................................................................................................... 1-5, 4-16 SRA........................................................................................................................................ 3-15, A-110, A-141 SRAV ..................................................................................................................................... 3-15, A-111, A-141 SRL ........................................................................................................................................ 3-15, A-112, A-141 SRLV...................................................................................................................................... 3-15, A-113, A-141 sseg ....................................................................................................................................................... 6-7, 6-10 State......................................................................................................................................................... 6-6, 9-4 Status................... 1-5, 2-15, 3-5, 3-20, 3-21, 4-5, 4-16, 4-17, 4-18, 4-21, 4-25, 4-29, 5-2, 5-5, 5-7, 5-9, 5-11, 5-12, 5-13, 5-14, 5-16, 5-19, 5-23, 5-24, 5-25, 6-2, 6-6, 6-8, 6-9, 6-10, 6-11, 6-12, 6-13, 8-25, 10-2, 10-4, 10-7, 10-8, 10-9, 11-2, 11-8, 11-9, 12-3, 12-4, 13-4, C-1, C-7, C-9, C-13, C-14, C-15, C-16 STATUS ............................................................................................................ 9-2, 9-10, 9-11, 12-6, 13-5, 13-6 steering .................................................................................................................................................. 2-6, 4-31 SteeringBits ..................................................................................................................................................C-10 stepping .............................................................................................................1-2, 9-8, 9-10, B-20, B-21, B-22 StoreFPR ............. D-2, D-4, D-5, D-12, D-13, D-16, D-17, D-18, D-19, D-20, D-23, D-24, D-28, D-30, D-31, D-32, D-33, D-35, D-36, D-38, D-39 StoreMemory ............................................... A-7, A-93, A-94, A-96, A-100, A-103, A-116, A-118, A-122, B-162 SUB............................................................................................................2-18, 3-15, 5-26, A-114, A-141, D-36 SUB.fmt ................................................................................................................................... 3-21, 10-14, D-41 Subroutine..................................................................................................................................................... 3-17 Subsequent............................................................................................................................................ 2-4, 6-17 Subtract....................................................................................................................... 3-15, 3-21, 3-24, B-3, B-5 SUBU ..........................................................................................................................3-15, A-114, A-115, A-141 supervisor ............................................................................................ 4-18, 5-15, 6-10, 6-12, 9-11, 13-5, 13-14 Supervisor............ 2-16, 2-19, 4-17, 4-18, 4-29, 5-2, 5-15, 5-22, 5-23, 6-6, 6-7, 6-10, 6-12, 9-2, 13-5, 13-6, C-1, C-14, C-15 SUPERVISOR ................................................................................................................................................ 9-5 suseg ..................................................................................................................................................... 6-7, 6-10 SW .................................................................................................... 3-4, A-5, A-116, A-141, B-163, C-41, D-40 SWC1............................................................................3-5, 3-21, 10-13, 13-2, A-141, B-163, C-41, D-37, D-40 SWC2.......................................................................................................................... A-142, B-165, C-42, D-41
X-20
Index
SWL ........................................................................... 3-4, 3-8, A-117, A-118, A-121, A-141, B-163, C-41, D-40 SWR........................................................................... 3-4, 3-8, A-117, A-121, A-122, A-141, B-163, C-41, D-40 SYNC ................... 2-11, 2-12, 2-13, 3-19, 5-24, 6-17, 13-9, 13-16, 13-18, 13-20, A-125, A-141, C-13, C-27, C-28, C-29, C-30, C-31, C-32, C-33, C-34, C-35, C-36, C-38, C-39, C-40 Synchronization ................................................................................................................................... 2-11, 3-19 Sys ................................................................................................................................................4-20, 5-8, 5-20 SYS................................................................................................................................................................. 8-3 SYSAACK .......................................... 8-3, 8-9, 8-12, 8-13, 8-14, 8-16, 8-19, 8-22, 8-25, 8-26, 8-27, 8-28, 8-29 SYSADDR................................................................................................................................................ 8-3, 8-7 SYSASTART................................................................................................8-3, 8-7, 8-9, 8-12, 8-13, 8-16, 8-19 SYSBE ..................................................................................................................................................... 8-3, 8-7 Syscall......................................................................................................................................4-20, 5-2, 5-8, 5-9 SYSCALL..............................................................................2-11, 3-18, 4-4, 5-10, 5-20, 9-7, 9-8, A-126, A-141 SYSDACK............................ 8-3, 8-10, 8-12, 8-13, 8-16, 8-17, 8-19, 8-20, 8-22, 8-25, 8-26, 8-27, 8-28, A-125 SYSDATA................................................................................................................ 8-3, 8-6, 8-7, 8-9, 8-16, 8-17 SYSDSTART......................................................................... 8-3, 8-10, 8-12, 8-13, 8-16, 8-17, 8-19, 8-20, 8-25 SYSRD............................................................................................................................................................ 8-3 SYSTSIZE........................................................................................................... 8-3, 8-9, 8-12, 8-13, 8-16, 8-19 SYSWR........................................................................................................................................................... 8-3
T
Tag ..................................................................................................... 2-6, 2-7, 2-15, 4-5, C-9, C-11, C-12, C-13 TAG.................................................................................................................................................................C-6 TagHi................................................................................................................................... 2-15, 4-5, 4-31, 4-32 TagHI................................................................................................................................................... C-10, C-11 TagLo .................................................................................................................................. 2-15, 4-5, 4-31, 4-32 TagLO ............................................................................................................................... C-9, C-10, C-11, C-12 tags ..............................................................................................................................................4-31, C-9, C-12 TargetAddress..................................................................................................................................... C-10, C-11 TEQ....................................................................................................................... 3-18, 5-27, 9-8, A-127, A-141 TEQI...................................................................................................................... 3-18, 5-27, 9-8, A-128, A-142 TGE...............................................................................................................................3-18, 5-27, A-129, A-141 TGEI..............................................................................................................................3-18, 5-27, A-130, A-142 TGEIU ...........................................................................................................................3-18, 5-27, A-131, A-142 TGEU ............................................................................................................................3-18, 5-27, A-132, A-141 timer ............................................................................................................................................4-13, 4-15, 4-16 TLB ...................... 1-2, 2-3, 2-6, 2-7, 2-15, 2-16, 3-20, 4-5, 4-6, 4-7, 4-8, 4-9, 4-10, 4-11, 4-12, 4-14, 4-17, 4-20, 4-29, 5-2, 5-7, 5-8, 5-9, 5-10, 5-11, 5-12, 5-16, 5-17, 5-18, 6-1, 6-2, 6-3, 6-4, 6-7, 6-8, 6-9, 6-12, 6-14, 6-15, 6-16, 6-17, 6-18, 6-19, 6-20, 12-6, A-6, A-56, A-57, A-58, A-62, A-66, A-67, A-68, A-70, A-74, A-78, A-79, A-92, A-93, A-94, A-98, A-102, A-103, A-116, A-120, A-124, B-10, B-162, C-6, C-7, C-8, C-28, C-37, C-38, C-39, C-40, D-26, D-37
X-21
Index
TLBEnteries ..................................................................................................................................................C-37 TLBL ............................................................................................................................ 4-8, 4-20, 5-8, 5-16, 5-17 TLBP ............................................................................................... 3-20, 4-6, 5-17, 5-18, 6-2, 6-20, C-37, C-42 TLBR................................................................................................................2-13, 3-20, 4-6, 6-20, C-38, C-42 TLBS ............................................................................................................................ 4-8, 4-20, 5-8, 5-16, 5-17 TLBWI...................................................................................2-13, 3-20, 4-6, 4-8, 6-20, C-28, C-38, C-39, C-42 TLBWR .................................................................................2-13, 3-20, 4-7, 4-8, 6-20, C-28, C-38, C-40, C-42 TLT................................................................................................................................3-18, 5-27, A-133, A-141 TLTI...............................................................................................................................3-18, 5-27, A-134, A-142 TLTIU ............................................................................................................................3-18, 5-27, A-135, A-142 TLTU .............................................................................................................................3-18, 5-27, A-136, A-141 TNE...............................................................................................................................3-18, 5-27, A-137, A-141 TNEI..............................................................................................................................3-18, 5-27, A-138, A-142 TPC................................................................................................................................... 12-3, 12-5, 12-6, 12-7 TPCE ..........................................................................................................................................12-3, 12-5, 12-6 Trace...........................................................................................................................................12-1, 12-2, 12-3 transaction ................................................................................................................. 8-8, 8-10, 8-12, 8-14, 8-22 Translation ............................................................................................. 2-3, 6-2, 6-3, 6-4, 6-5, 6-18, 6-19, 6-20 translations .................................................................................................................................... 4-9, 6-1, A-92 Trap...................... 2-11, 3-18, 4-20, 5-2, 5-8, 5-9, 5-10, 5-27, 9-8, A-127, A-128, A-129, A-130, A-131, A-132, A-133, A-134, A-135, A-136, A-137, A-138 TRAP ..............................................................................................................................................4-4, 5-27, 9-7 TRIG .................................................................................................................................................. 13-9, 13-20 Trigger.................................................................................................................................................. 2-19, 13-6 Triplebyte ............................................................................................................................................. 3-10, 3-12 TRUNC.L. .....................................................................................................................................................D-38 TRUNC.L.fmt ........................................................................................................................... 3-21, 10-14, D-41 TRUNC.W .....................................................................................................................................................D-39 TRUNC.W.fmt .......................................................................................................................... 3-21, 10-14, D-41
U
U0 ..........................................................................................................................................4-29, 9-2, 9-5, 9-11 U1 ................................................................................................................................................. 4-29, 9-5, 9-11 UCA ................................................................................................................................................................ 9-7 UCAB ............................................................................................................................... 2-4, 2-6, 2-7, 6-17, 9-9 unaligned ...........................................3-8, 13-8, A-59, A-63, A-71, A-74, A-75, A-78, A-95, A-99, A-117, A-121 uncached ............. 1-1, 2-4, 5-11, 5-12, 6-12, 6-16, 6-17, 8-12, 9-8, 9-9, 9-10, A-6, A-8, A-56, A-57, A-58, A-60, A-64, A-67, A-68, A-70, A-72, A-76, A-79, A-91, A-92, A-93, A-94, A-96, A-100, A-103, A-116, A-118, A-122, A-125, B-10, B-162, C-6, C-7 Uncached............................................................................. 2-4, 4-8, 4-24, 6-7, 6-17, 6-20, 8-8, 8-12, 9-7, 9-10 UndefinedResult .. A-8, A-11, A-12, A-13, A-14, A-38, A-40, A-86, A-87, A-110, A-111, A-112, A-113, A-114,
X-22
Index
A-115, B-7, B-9, B-11, B-12, B-13, B-14, B-23, B-24, B-25, B-26, B-68, B-70, B-93, B-95, B-113, B-120, B-122 underflow ............. 2-9, B-29, B-30, B-31, B-35, B-37, B-46, B-47, B-142, B-143, B-144, B-148, B-150, B-152, B-155, B-157, B-159 Underflow............................................................ B-31, B-35, B-37, B-144, B-148, B-150, B-152, B-155, B-157 UNIX ............................................................................................................................................A-39, B-8, B-67 unmapped ...................................................5-11, 5-12, 6-7, 6-12, 9-8, 9-10, 13-9, A-6, C-28, C-38, C-39, C-40 Unmapped ...................................................................................................................................................... 6-7 Unsigned.......................................................................3-4, 3-14, 3-15, 3-16, 3-18, 3-23, 3-24, B-3, B-5, B-158 useg ..................................................................................................................................................6-7, 6-8, 6-9 UW ..............................................................................................................................................................B-102
V
VA ..............................................................................................................C-6, C-7, C-8, C-9, C-10, C-11, C-12 VALID..............................................................................................................................................................C-9 VALUE ..........................................................................................................................................4-28, 4-30, 9-2 Value FPR.....................................................................................................................................................D-10 ValueFPR.......................................................................................................................... D-4, D-12, D-13, D-16 VAX ................................................................................................................................................................. 3-6 VPN..........................................................................................................................................4-9, 5-15, 6-4, 6-5 VPN2................................................................................................................................ 4-14, 6-16, C-39, C-40
W
WBB............................................................................................................................... 2-4, 4-29, 8-15, 9-6, 9-9 Wide...................................................................................................................................2-10, 2-11, 2-12, 2-13 wired ............................................................................................................................................. 2-15, 4-5, 4-11 Wired.............................................................................................................................2-15, 4-5, 4-7, 4-11, 5-11 WORD ................................................................................................................. A-7, A-70, A-79, A-116, A-122 writeback.......................................................................................................................................................A-91 Writeback ........................................................................................................... 2-4, C-7, C-8, C-11, C-12, C-13 WRITEBACK.........................................................................................................................................C-6, C-13
X
XOR ....................................................................................... 3-15, 3-25, A-3, A-139, A-140, A-141, B-4, B-160 XORI ...................................................................................................... 3-14, A-140, A-141, B-163, C-41, D-40
X-23
Index
X-24
Appendix A CPU Instruction Set Details
A. CPU Instruction Set Details
This appendix provides a detailed description of the operation of each instruction. The instructions are listed in alphabetical order. Exceptions that may occur due to the execution of each instruction are listed after the description of each instruction. Descriptions of the immediate cause and manner of handling exceptions are omitted from the instruction descriptions in this appendix. Descriptions use a pseudocode notation explained in Section A.2. For an overview of the instruction set, refer to Chapter 3 of the User's Manual.
A-1
Appendix A CPU Instruction Set Details
A.1 Description of an Instruction
Each instruction description contains several sections that contain specific information about the instruction. The following sections describe the contents of each section in detail.
A.1.1
Instruction Mnemonic and Name
The instruction mnemonic and name are printed as page headings for each page in the instruction description.
A.1.2
Instruction Encoding Picture
The instruction word encoding is shown in pictorial form at the top of the instruction description. The picture shows the values of all constant fields and the opcode names for opcode fields in upper-case. It labels all variable fields with lower-case names that are used in the instruction description. Fields that contain zeroes but are not named are unused fields that are required to be zero.
A.1.3
Format
The assembler formats for the instruction and the architecture level at which the instruction was originally defined are shown.
A.1.4
Purpose
This is a very short statement of the purpose of the instruction.
A.1.5
Description
If a one-line symbolic description of the instruction is feasible, it will appear immediately to the right of the Description heading. The body of the section is a description of the operation of the instruction in text, tables, and figures. This description complements the high-level language description in the Operation section.
A.1.6
Restrictions
This section documents the restrictions on the instructions. Most restrictions fall in the category of alignment requirements for memory addresses, valid values of operands, and order of instructions necessary to gurantee correct execution.
A.1.7
Operation
This section describes the operation as pseudocode in a high-level language notation resembling Pascal. The purpose of this section is to describe the operation of the instruction clearly in a form with less ambiguity than prose.
A.1.8
Exceptions
This section lists the exceptions that can be caused by the operation of the instruction. It omits exceptions that can be caused by instruction fetch, performance counters, and breakpoints. It also omits exceptions that can be caused by asynchronous external events, e.g. interrupts. Although the Bus Error exception may be caused by the operation of a load, store or PREF instruction this section does not list Bus Error for load, store or PREF instructions because the relationship between these instructions and external error conditions, like Bus Error is asynchronous and implementation specific. A-2
Appendix A CPU Instruction Set Details
A.1.9
Programming Notes, Implementation Notes
These sections contain material that is useful for programmers and implementors respectively but is not necessary to describe the instruction and does not belong in the description sections.
A.2 Instruction Description Notation and Functions
The Operation sections of the instruction descriptions describe the operation performed by each instruction using a high-level language notation, or pseudocode. Symbols, functions, and structures used in the Operation sections are described here. A.2.1.1 Pseudocode Language Statement Execution
Each of the high-level language statements in an operation description is executed in sequential order (as modified by conditional and loop constructs). A.2.1.2 Pseudocode Symbols
Special symbols used in the notation are described in Table A-1.
Table A-1. Symbols in Instruction Operation Statements Symbol =, || X
y
Meaning Assignment. Tests for equality and inequality. Bit string concatenation. A y-bit string formed by y copies of the single-bit value x. Selection of bits y through z of bit string x. Two's complement or floating point arithmetic: addition, subtraction. Two's complement or floating point multiplication (both used for either). Two's complement integer division. Two's complement modulo. Floating point division. Two's complement less than comparison. Bit-wise logical NOT. Bit-wise logical NOR. Bit-wise logical XOR. Bit-wise logical AND. Bit-wise logical OR. The length in bits (64 in the C790), of the CPU General Purpose Registers. CPU General Purpose Register x. The content of GPR[0] is always zero. Coprocessor unit z, general register x. Coprocessor unit z, control register x. Coprocessor unit z condition signal. Big-endian made as configured at reset (0Little, 1Big) from core boundary signal.
Xy..z +, - *, x div Mod / < Not Nor Xor And or GPRLEN GPR[x] CPR[z, x] CCR[z, x] CPCOND[z] BigEndian
A-3
Appendix A CPU Instruction Set Details
Symbol
Meaning This occurs as a prefix to operation description lines and functions as a label. It indicates the instruction time during which the effects of the pseudocode lines appears to occur (i.e., when the pseudocode is "executed"). Unless otherwise indicated, all effects of the current instruction appear to occur during the instruction time of the current instruction. No label is equivalent to a time label of "I:". Sometimes effects of an instruction appear to occur either earlier or later-during the instruction time of another instruction. When that happens, the instruction operation is written in sections labeled with the instruction time, relative to the current instruction I, in which the effect of that pseudocode appears to occur. For example, an instruction may have a result that is not available until after the next instruction. Such an instruction will have the portion of the instruction operation description that writes the result register in a section labeled "I+1:". The effect of pseudocode statements for the current instruction labeled "I+1:" appears to occur "at the same time" as the effect of pseudocode statements labeled "I:" for the following instruction. Within one pseudocode sequence the effects of the statements takes place in order. However, between sequences of statements for different instructions that occur "at the same time", there is no order defined. Programs must not depend on a particular order of evaluation between such sections. The Program Counter value. During the instruction time of an instruction this is the address of the instruction word. The address of the instruction that occurs during the next instruction time is determined by assigning a value to PC during an instruction time. If no value is assigned to PC during instruction time by any pseudocode statement, it is automatically incremented by 4 before the next instruction time. A taken branch assigns the target address to PC during the instruction time of the instruction in the branch delay slot. The SIZE, number of bits, of Physical address in an implementation.
I:, I+n:, I-n:
PC
PSIZE
A.2.2
Definitions of Pseudocode Functions Used in Instruction Descriptions
A variety of functions are used in the pseudocode employed in the instruction descriptions. These functions are used to make the pseudocode more readable and also to abstract implementation-specific behavior. These functions are defined in this section. Certain additional functions specific to a particular coprocessor are described at the beginning of the appendix for that coprocessor. A.2.2.1 Coprocessor General Register Access Pseudocode Functions
Defined coprocessors, except for COP0, have instructions to exchange words and doublewords and quadwords between coprocessor general registers and the rest of the system. What a coprocessor does with a word or doubleword supplied to it, and how a coprocessor supplies a word or doubleword, is defined by the coprocessor itself. The functions are listed in Table A-2.
A-4
Appendix A CPU Instruction Set Details
Table A-2. Coprocessor General Register Access Functions COP_LW(z, rt, memword) z: The coprocessor unit number. rt: Coprocessor general register specifier. Memword: A 32-bit word value supplied to the coprocessor. This is the action taken by coprocessor z when supplied with a word from memory during a load word operation. The action is coprocessor-specific. The typical action would be to store the contents of memword in coprocessor general register rt. COP_LD(z, rt, memdouble) z: The coprocessor unit number. rt: Coprocessor general register specifier. Memdouble: 64-bit doubleword value supplied to the coprocessor. This is the action taken by coprocessor z when supplied with a doubleword from memory during a load doubleword operation. The action is coprocessor-specific. The typical action would be to store the contents of memdouble in coprocessor general register rt. Dataword COP_SW(z, rt) z: The coprocessor unit number. rt: Coprocessor general register specifier. Dataword: 32-bit word value. This defines the action taken by coprocessor z to supply a word of data during a store word operation. The action is coprocessor-specific. The typical action would be to supply the contents of low-order word in coprocessor general register rt. Datadouble COP_SD(z, rt) z: The coprocessor unit number. rt: Coprocessor general register specifier. Datadouble: 64-bit doubleword value. This defines the action taken by coprocessor z to supply a doubleword of data during a store doubleword operation. The action is coprocessor-specific. The typical action would be to supply the contents of the doubleword coprocessor general register rt.
A-5
Appendix A CPU Instruction Set Details A.2.2.2 Load and Store Memory Pseudocode Functions
Regardless of byte-numbering order (endianness), the address of a halfword, word, or doubleword is the smallest byte address among the bytes in the object. For a big-endian ordering this is the most-significant byte; for a little-endian ordering this is the leastsignificant byte. In the operation description pseudocode for load and store operations, the functions listed in Table A-3 are used to summarize the handling of virtual addresses and accessing physical memory. The size of the data item to be loaded or stored is passed in the AccessLength field. The valid constant names and values are shown in Table A-4. The bytes within the addressed unit of memory (quadword for 128-bit processors) which are used can be determined directly from the AccessLength and the four low-order bits of the address.
Table A-3. Load and Store Functions (pAddr, CCA) AddressTranslation (vAddr, IorD, LorS) pAddr: Physical Address. CCA: Cache Coherence Algorithm: the method used to access caches and memory and resolve the reference. vAddr: Virtual Address. IorD: Indicates whether access is for Instruction or Data. LorS: Indicates whether access is for Load or Store Translate a virtual address to a physical address and a cache coherence algorithm describing the mechanism used to resolve the memory reference. Given the virtual address vAddr, and whether the reference is to Instructions or Data (IorD), find the corresponding physical address (pAddr) and the cache coherence algorithm (CCA) used to resolve the reference. If the virtual address is in one of the unmapped address spaces the physical address and CCA are determined directly by the virtual address. If the virtual address is in one of the mapped address spaces then the TLB is used to determine the physical address and access type; if the required translation is not present in the TLB or the desired access is not permitted the function fails and an exception is taken. MemElem LoadMemory (CCA, AccessLength, pAddr, vAddr, IorD) MemElem: Data is returned in a fixed width with a natural alignment. The width is the same size as the CPU general purpose register. CCA: Cache Coherence Algorithm: the method used to access caches and memory and resolve the reference. AccessLength: Length, in bytes, of access. pAddr: Physical Address. vAddr: Virtual Address. IorD: Indicates whether access is for Instructions or Data. Load a value from memory. Uses the cache and main memory as specified in the Cache Coherence Algorithm (CCA) and the sort of access (IorD) to find the contents of AccessLength memory bytes starting at physical location pAddr. The data is returned in the fixed width naturally-aligned memory element (MemElem). The low-order two, three, or four bits of the address and the AccessLength indicate which of the bytes within MemElem needs to be given to the processor. If the memory access type of the reference is uncached then only the referenced bytes are read from memory ad valid within the memory element. If the access type is cached, and the data is not present in cache, an implementation specific size and alignment block of memory is read and loaded into the cache to satisfy a load reference. At a minimum, the block is the entire memory element.
A-6
Appendix A CPU Instruction Set Details
StoreMemory (CCA, AccessLength, MemElem, pAddr, vAddr) CCA: Cache Coherence Algorithm: the method used to access caches and memory and resolve the reference. AccessLength: Length, in bytes, of access. MemElem: Data in the width and alignment of a memory element. The width is the same size as the CPU general purpose register. For a partial-memoryelement store, only the bytes that will be stored must be valid. pAddr: Physical Address. vAddr: Virtual Address. Store a value to memory. The specified data is stored into the physical location pAddr using the memory hierarchy (data caches and main memory) as specified by the Cache Coherence Algorithm (CCA). The MemElem contains the data for an aligned, fixed-width memory element, though only the bytes that will actually be stored to memory need to be valid. The low-order four bits of pAddr and the AccessLength field indicates which of the bytes within the MemElem data should actually be stored; only these bytes in memory will be changed. Prefetch (CCA, pAddr, vAddr, DATA, hint) CCA: Cache Coherence Algorithm: the method used to access caches and memory and resolve the reference. pAddr: Physical Address. vAddr: Virtual Address. DATA: Indicates that access is for DATA. hint: Hint that indicates the possible use of the data Prefetch data from memory. Prefetch is an advisory instruction for which an implementation specific action is taken. The action taken may increase performance but must not change the meaning of the program or alter architecturally-visible state.
Table A-4. AccessLength Specifications for Loads / Stores AccessLength name QUADWORD DOUBLEWORD SEPTIBYTE SEXTIBYTE QUINTIBYTE WORD TRIPLEBYTE HALFWORD BYTE Value 15 7 6 5 4 3 2 1 0 Meaning 16 bytes (128 bits) 8 bytes (64 bits) 7 bytes (56 bits) 6 bytes (48 bits) 5 bytes (40 bits) 4 bytes (32 bits) 3 bytes (24 bits) 2 bytes (16 bits) 1 byte (8 bits)
A-7
Appendix A CPU Instruction Set Details A.2.2.3 Miscellaneous Functions
Table A-5 describes additional miscellaneous functions for CPU instruction descriptions.
Table A-5. Miscellaneous Functions SyncOperation (stype) stype: Type of synchronization operation to be performed. Based on the value of stype either a memory barrier operation is performed or a pipeline barrier operation is performed. In case of a memory barrier all pending loads and stores are retired. Loads are retired when the destination register is written. Stores are retired when the stored data (in store buffers or write buffers) is either stored in the data cache, or sent on the processor bus. All uncached accelerated data gathering operation is terminated. The uncached accelerated buffer is invalidated. All bus read processes due to load/store/pref/cache instructions are completed. All pending bus write processes in the write back buffer are completed. In case of pipeline barrier all instructions prior to the barrier are completed before the instructions following the barrier operation are fetched. Note that the barrier operation does not wait for any instruction which was issued prior to the barrier operation but not retired (e.g., multiply, divide, multicycle COP1 operations or a pending load which were issued prior to the pipeline barrier operation). SignalException (Exception) Exception; The exception condition that exists. Signal an exception condition. This will result in an exception that aborts the instruction. The instruction operation pseudocode will never see a return from this function call. UndefinedResult() This function indicates that the result of the operation is undefined. NullifyCurrentInstruction() Nullify the current instruction. This occurs during the instruction time for some instruction and that instruction is not executed further. This appears for branch-likely instructions during the execution of the instruction in the delay slot and it kills the instruction in the delay slot. CoprocessorOperation (z, cop_fun) z: Coprocessor unit number cop_fun: Coprocessor function from function field of instruction Perform the specified Coprocessor operation.
A-8
Appendix A CPU Instruction Set Details
A.3
CPU Instruction Formats
A CPU instruction is a single 32-bit aligned word. There are three instruction formats: Immediate (I-type), Jump (J-type), and Register (R-type). These formats are shown in Figure A-1 below:
I-Type (Immediate)
31 26 25 21 20 16 15 0
op
6
rs
5
rt
5
immediate
16
J-Type (Jump)
31 26 25 0
op
6
target
26
R-Type (Register)
31 26 25 21 20 16 15 11 10 6 5 0
op
6
rs
5
rt
5
rd
5
sa
5
funct
6
op rd rs rt immediate
6-bit primary operation code 5-bit destination register specifier 5-bit source register specifier 5-bit target (source/destination) register specification or branch condition 16-bit signed immediate used for: logical operands, arithmetic signed operands, load/store address byte offsets, PC-relative branch signed instruction displacement 26-bit index shifted left two bits to supply the low-order 28 bits of the jump target address. 5-bit shift amount 6-bit function field used to specify functions within the primary operation code value SPECIAL Figure A-1. CPU Instruction Formats
target sa funct
A-9
Appendix A CPU Instruction Set Details
A.4 Instruction Descriptions
The user-level CPU instructions are described in alphabetical order in this section.
A-10
Appendix A CPU Instruction Set Details
ADD
31 26 25 21 20
Add Word 16 15 11 10 65 0
ADD
0 00000
5
SPECIAL 000000
6
rs
5
rt
5
rd
5
ADD 100000
6
MIPS I
Format: Purpose: Description: ADD rd, rs, rt To add 32-bit integers. If overflow occurs, then trap. rd rs + rt
The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32-bit result. If the addition results in 32-bit 2's complement arithmetic overflow then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit result is placed into GPR rd.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined.
Operation:
If (NotWordValue (GPR[rs] 63..0) or NotWordValue (GPR[rt] 63..0)) then UndefinedResult()endif temp GPR[rs] 63..0 + GPR[rt] 63..0 if (32_bit_arithmetic_overflow) then SignalException (IntegerOverflow) else GPR[rd]63..0 sign_extend (temp31..0) endif
Exceptions:
Integer Overflow
Programming Notes:
ADDU performs the same arithmetic operation but, does not trap on overflow.
A-11
Appendix A CPU Instruction Set Details
ADDI
31 26 25 21 20
Add Immediate Word 16 15 0
ADDI
ADDI 001000
6
rs
5
rt
5
immediate
16
MIPS I
Format: Purpose: Description: ADDI rt, rs, immediate To add a constant to a 32-bit integer. If overflow occurs, then trap. rt rs + immediate
The 16-bit signed immediate is added to the 32-bit value in GPR rs to produce a 32-bit result. If the addition results in 32-bit 2's complement arithmetic overflow then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit result is placed into GPR rt.
Restrictions:
If GPR rs does not contain a sign-extended 32-bit value (bits 63..31 equal), then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR[rs] 63..0)) then UndefinedResult() endif temp GPR[rs] 63..0 + sign_extend (immediate) if (32_bit_arithmetic_overflow) then SignalException (IntegerOverflow) else GPR[rt]63..0 sign_extend (temp31..0) endif
Exceptions:
Integer Overflow
Programming Notes:
ADDIU performs the same arithmetic operation but, does not trap on overflow.
A-12
Appendix A CPU Instruction Set Details
ADDIU
31 26 25 21 20
Add Immediate Unsigned Word 16 15
ADDIU
0
ADDIU 001001
6
rs
5
rt
5
immediate
16
MIPS I
Format: Purpose: Description: ADDIU rt, rs, immediate To add a constant to a 32-bit integer. rt rs + immediate
The 16-bit signed immediate is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rt. No Integer Overflow exception occurs under any circumstances.
Restrictions:
If GPR rs does not contain a sign-extended 32-bit value (bits 63..31 equal), then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR[rs] 63..0)) then UndefinedResult( ) endif temp GPR[rs] 63..0 + sign_extend (immediate) GPR[rt] 63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
The term "unsigned" in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
A-13
Appendix A CPU Instruction Set Details
ADDU
31 26 25 21 20
Add Unsigned Word 16 15 11 10 65
ADDU
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
ADDU 100001
6
MIPS I
Format: Purpose: Description: ADDU rd, rs, rt To add 32-bit integers. rd rs + rt
The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rd. No Integer Overflow exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR[rs] 63..0) or NotWordValue (GPR[rt] 63..0)) then UndefinedResult() endif temp GPR[rs] 63..0 + GPR[rt] 63..0 GPR[rt] 63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
The term "unsigned" in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
A-14
Appendix A CPU Instruction Set Details
AND
31 26 25 21 20 16 15
And 11 10 65 0
AND
0 00000
5
SPECIAL 000000
6
rs
5
rt
5
rd
5
AND 100100
6
MIPS I
Format: Purpose: Description: AND rd, rs, rt To do a bitwise logical AND. rd rs AND rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical AND operation. The result is placed into GPR rd.
Restrictions:
None
Operation:
GPR[rd] 63..0 GPR[rs] 63..0 and GPR[rt] 63..0
Exceptions:
None
Programming Notes:
None
A-15
Appendix A CPU Instruction Set Details
ANDI
31 26 25 21 20
And Immediate 16 15 0
ANDI
immediate
16
ANDI 001100
6
rs
5
rt
5
MIPS I
Format: Purpose: Description: ANDI rt, rs, immediate To do a bitwise logical AND with a constant. rt rs AND immediate
The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical AND operation. The result is placed into GPR rt.
Restrictions:
None
Operation:
GPR[rt] 63..0 zero_extend (immediate) and GPR[rs] 63..0
Exceptions:
None
Programming Notes:
None
A-16
Appendix A CPU Instruction Set Details
BEQ
31 26 25 21 20
Branch on Equal 16 15 0
BEQ
offset
16
BEQ 000100
6
rs
5
rt
5
MIPS I
Format: Purpose: Description: BEQ rs, rt, offset To compare GPRs then do a PC-relative conditional branch. if (rs = rt) then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs and GPR rt are equal, branch to the effective target address after the instruction in the delay slot is executed.
Restriction:
None
Operation:
tgt_offset sign_extend (offset || 02) condition (GPR[rs] 63..0 = GPR[rt] 63..0) +1: if condition then PC PC + tgt_offset endif
Exceptions:
:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-17
Appendix A CPU Instruction Set Details
BEQL
31 26 25 21 20
Branch on Equal Likely 16 15 0
BEQL
BEQL 010100
6
rs
5
rt
5
offset
16
MIPS II
Format: Purpose: Description: BEQL rs, rt, offset To compare GPRs then do a PC-relative conditional branch; execute the delay slot only if the branch is taken. if (rs = rt) then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs and GPR rt are equal, branch to the target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
None
Operation:
tgt_offset sign_extend (offset || 02) condition (GPR[rs] 63..0 = GPR[rt] 63..0) +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-18
Appendix A CPU Instruction Set Details
BGEZ
31 26 25
Branch on Greater Than or Equal to Zero 21 20 16 15 0
BGEZ
REGIMM 000001
6
rs
5
BGEZ 00001
5
offset
16
MIPS I
Format: Purpose: Description: BGEZ rs, offset To test a GPR then do a PC-relative conditional branch. if (rs 0) then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 0GPRLEN +1: if condition then PC PC + tgt_offset endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-19
Appendix A CPU Instruction Set Details
BGEZAL
31 26 25
Branch on Greater Than or Equal to Zero and Link 21 20 16 15
BGEZAL
0
REGIMM 000001
6
rs
5
BGEZAL 10001
5
offset
16
MIPS I
Format: Purpose: Description: BGEZAL rs, offset To test a GPR then do a PC-relative conditional procedure call. if (rs 0) then procedure_call
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution would continue after a procedure call. An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.
Restriction:
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is undefined. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Operation:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 0GPRLEN GPR[31] 63..0 zero_extend (PC+8) +1: if condition then PC PC + tgt_offset endif
Exceptions:
:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more distant addresses.
A-20
Appendix A CPU Instruction Set Details
BGEZALL
31 26 25
Branch on Greater Than or Equal to Zero and Link Likely 21 20 16 15
BGEZALL
0
REGIMM 000001
6
rs
5
BGEZALL 10011
5
offset
16
MIPS II
Format: Purpose: Description: BGEZALL rs, offset To test a GPR then do a PC-relative conditional procedure call; execute the delay slot only if the branch is taken. if (rs 0) then procedure_call_likely
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution would continue after a procedure call. An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is undefined. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Operation:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 0GPRLEN GPR[31] 63..0 zero_extend (PC+8) +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more distant addresses.
A-21
Appendix A CPU Instruction Set Details
BGEZL
31 26 25
Branch on Greater Than or Equal to Zero Likely 21 20 16 15
BGEZL
0
REGIMM 000001
6
rs
5
BGEZL 00011
5
offset
16
MIPS II
Format: Purpose: Description: BGEZL rs, offset To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken. if (rs 0) then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 0GPRLEN +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-22
Appendix A CPU Instruction Set Details
BGTZ
31 26 25 21 20
Branch on Greater Than Zero 16 15 0
BGTZ
BGTZ 000111
6
rs
5
0 00000
5
offset
16
MIPS I
Format: Purpose: Description: BGTZ rs, offset To test a GPR then do a PC-relative conditional branch. if (rs > 0) then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 > 0GPRLEN +1: if condition then PC PC + tgt_offset endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-23
Appendix A CPU Instruction Set Details
BGTZL
31 26 25
Branch on Greater Than Zero Likely 21 20 16 15
BGTZL
0
BGTZL 010111
6
rs
5
0 00000
5
offset
16
MIPS II
Format: Purpose: Description: BGTZL rs, offset To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken. if (rs > 0) then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
None
Operations:
:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 > 0GPRLEN +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-24
Appendix A CPU Instruction Set Details
BLEZ
31 26 25
Branch on Less Than or Equal to Zero 21 20 16 15 0
BLEZ
BLEZ 000110
6
rs
5
0 00000
5
offset
16
MIPS I
Format: Purpose: Description: BLEZ rs, offset To test a GPR then do a PC-relative conditional branch. if (rs 0) then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of the GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 0GPRLEN +1: if condition then PC PC + tgt_offset endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-25
Appendix A CPU Instruction Set Details
BLEZL
31 26 25
Branch on Less Than or Equal to Zero Likely 21 20 16 15
BLEZL
0
BLEZL 010110
6
rs
5
0 00000
5
offset
16
MIPS II
Format: Purpose: Description: BLEZL rs, offset To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken. if (rs 0) then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 0GPRLEN +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-26
Appendix A CPU Instruction Set Details
BLTZ
31 26 25 21 20
Branch on Less Than Zero 16 15 0
BLTZ
REGIMM 000001
6
rs
5
BLTZ 00000
5
offset
16
MIPS I
Format: Purpose: Description: BLTZ rs, offset To test a GPR then do a PC-relative conditional branch. if (rs < 0) then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
None
Operation:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 < 0GPRLEN +1: if condition then PC PC + tgt_offset endif
Exceptions:
:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-27
Appendix A CPU Instruction Set Details
BLTZAL
31 26 25
Branch on Less Than Zero and Link 21 20 16 15
BLTZAL
0
REGIMM 000001
6
rs
5
BLTZAL 10000
5
offset
16
MIPS I
Format: Purpose: Description: BLTZAL rs, offset To test a GPR then do a PC-relative conditional procedure call. if (rs < 0) then procedure_call
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch (not the branch itself), where execution would continue not after a procedure call. An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch, in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is undefined. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Operation:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 < 0GPRLEN GPR[31] 63..0 zero_extend (PC+8) +1: if condition then PC PC + tgt_offset endif
Exceptions:
:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more distant addresses.
A-28
Appendix A CPU Instruction Set Details
BLTZALL
31 26 25
Branch on Less Than Zero and Link Likely 21 20 16 15
BLTZALL
0
REGIMM 000001
6
rs
5
BLTZALL 10010
5
offset
16
MIPS II
Format: Purpose: Description: BLTZALL rs, offset To test a GPR then do a PC-relative conditional procedure call; execute the delay slot only if the branch is taken. if (rs < 0) then procedure_call_likely
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch (not the branch itself), where execution would continue not after a procedure call. An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch, in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is undefined. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.
Operation:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 < 0GPRLEN GPR[31] 63..0 zero_extend (PC+8) +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range 128 KB. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to more distant addresses.
A-29
Appendix A CPU Instruction Set Details
BLTZL
31 26 25 21 20
Branch on Less Than Zero Likely 16 15
BLTZL
0
REGIMM 000001
6
rs
5
BLTZL 00010
5
offset
16
MIPS II
Format: Purpose: Description: BLTZL rs, offset To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken. if (rs < 0) then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition GPR[rs] 63..0 < 0GPRLEN +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-30
Appendix A CPU Instruction Set Details
BNE
31 26 25 21 20
Branch on Not Equal 16 15 0
BNE
offset
16
BNE 000101
6
rs
5
rt
5
MIPS I
Format: Purpose: Description: BNE rs, rt, offset To compare GPRs then do a PC-relative conditional branch. if (rs rt) then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition (GPR[rs] 63..0 GPR[rt] 63..0) +1: if condition then PC PC + tgt_offset endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-31
Appendix A CPU Instruction Set Details
BNEL
31 26 25 21 20
Branch on Not Equal Likely 16 15 0
BNEL
BNEL 010101
6
rs
5
rt
5
offset
16
MIPS II
Format: Purpose: Description: BNEL rs, rt, offset To compare GPRs then do a PC-relative conditional branch; execute the delay slot only if the branch is taken. if (rs rt) then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.
Restrictions:
None
Operation:
:
tgt_offset sign_extend (offset || 02) condition (GPR[rs] 63..0 GPR[rt] 63..0) +1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions:
None
Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is 128 KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
A-32
Appendix A CPU Instruction Set Details
BREAK
31 26 25
Breakpoint 65
BREAK
0
SPECIAL 000000
6
code
20
BREAK 001101
6
MIPS I
Format: Purpose: Description: BREAK To cause a Breakpoint exception.
A breakpoint exception occurs, immediately and unconditionally transferring control to the exception handler. The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.
Restrictions:
None
Operation:
SignalException (Breakpoint)
Exceptions:
Breakpoint
Programming Notes:
None
A-33
Appendix A CPU Instruction Set Details
DADD
31 26 25 21 20
Doubleword Add 16 15 11 10 65
DADD
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
DADD 101100
6
MIPS III
Format: Purpose: Description: DADD rd, rs, rt To add 64-bit integers. If overflow occurs, then trap. rd rs + rt
The 64-bit doubleword value in GPR rt is added to the 64-bit value in GPR rs to produce a 64-bit result. If the addition results in 64-bit 2's complement arithmetic overflow then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 64-bit result is placed into GPR rd.
Restrictions:
None
Operation:
temp GPR[rs] 63..0 + GPR[rt] 63..0 if (64_bit_arithmetic_overflow) then SignalException (IntegerOverflow) else GPR[rd] 63..0 temp endif
Exceptions:
Integer Overflow
Programming Notes:
DADDU performs the same arithmetic operation but, does not trap on overflow.
A-34
Appendix A CPU Instruction Set Details
DADDI
31 26 25 21 20
Doubleword Add Immediate 16 15
DADDI
0
DADDI 011000
6
rs
5
rt
5
immediate
16
MIPS III
Format: Purpose: Description: DADDI rt, rs, immediate To add a constant to a 64-bit integer. If overflow occurs, then trap. rt rs + immediate
The 16-bit signed immediate is added to the 64-bit value in GPR rs to produce a 64-bit result. If the addition results in 64-bit 2's complement arithmetic overflow then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 64-bit result is placed into GPR rt.
Restrictions:
None
Operation:
temp GPR[rs] 63..0 + sign_extend (immediate) if (64_bit_arithmetic_overflow) then SignalException (IntegerOverflow) else GPR[rt] 63..0 temp endif
Exceptions:
Integer Overflow
Programming Notes:
DADDIU performs the same arithmetic operation but, does not trap on overflow.
A-35
Appendix A CPU Instruction Set Details
DADDIU
31 26 25
Doubleword Add Immediate Unsigned 21 20 16 15
DADDIU
0
DADDIU 011001
6
rs
5
rt
5
immediate
16
MIPS III
Format: Purpose: Description: DADDIU rt, rs, immediate To add a constant to a 64-bit integer. rt rs + immediate
The 16-bit signed immediate is added to the 64-bit value in GPR rs and the 64-bit arithmetic result is placed into GPR rt. No Integer Overflow exception occurs under any circumstances.
Restrictions:
None
Operation:
GPR[rt] 63..0 GPR[rs] 63..0 + sign_extend (immediate)
Exceptions:
None
Programming Notes:
The term "unsigned" in the instruction name is a misnomer; this operation is 64-bit modulo arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
A-36
Appendix A CPU Instruction Set Details
DADDU
31 26 25 21 20
Doubleword Add Unsigned 16 15 11 10 65
DADDU
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
DADDU 101101
6
MIPS III
Format: Purpose: Description: DADDU rd, rs, rt To add 64-bit integers. rd rs + rt
The 64-bit doubleword value in GPR rt is added to the 64-bit value in GPR rs and the 64bit arithmetic result is placed into GPR rd. No Integer Overflow exception occurs under any circumstances.
Restrictions:
None
Operation:
GPR[rd] 63..0 GPR[rs] 63..0 + GPR[rt] 63..0
Exception:
None
Programming Notes:
The term "unsigned" in the instruction name is a misnomer; this operation is 64-bit modulo arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
A-37
Appendix A CPU Instruction Set Details
DIV
31 26 25 21 20
Divide Word 16 15 65 0
DIV
DIV 011010
6
SPECIAL 000000
6
rs
5
rt
5
0 00 0000 0000
10
MIPS I
Format: Purpose: Description: DIV rs, rt To divide 32-bit signed integers. (LO, HI) rs / rt
The 32-bit word value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands as signed values. The 32-bit quotient is placed into special register LO and the 32-bit remainder is placed into special register HI. No arithmetic exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined. If the divisor in GPR rt is zero, the arithmetic result value is undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif q GPR[rs]31..0 div GPR[rt]31..0 LO63..0 sign_extend (q31..0) r GPR[rs]31..0 mod GPR[rt]31..0 HI63..0 sign_extend (r31..0)
Exceptions:
None
Supplementary Explanation:
Normally, when 0x80000000 (-2147483648) the signed minimum value is divided by 0xFFFFFFFF (-1), the operation will result in an overflow. However, in this instruction an overflow exception doesn't occur and the result will be as follows: Quotient is 0x80000000 (-2147483648), and remainder is 0x00000000 (0). This sign of the quotient and the remainder is based on the signs of the dividend and the divisor as shown in the table below:
A-38
Appendix A CPU Instruction Set Details
Dividend Positive Positive Negative Negative
Programming Notes:
Divisor Positive Negative Positive Negative
Quotient Positive Negative Negative Positive
Remainder Positive Positive Negative Negative
In the C790, the integer divide operation proceeds asynchronously and allows other CPU instructions to execute before it is retired. An attempt to read LO or HI before the results are written will wait (interlock) until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the divide so that other instructions can execute in parallel. No arithmetic exception occurs under any circumstances. If divide-by-zero or overflow conditions should be detected and some action taken, then the divide instruction is typically followed by additional instructions to check for a zero divisor and / or for overflow. If the divide is asynchronous then the zero-divisor check can execute in parallel with the divide. The action taken on either divide-by-zero or overflow is either a convention within the program itself or more typically, the system software; one possibility is to take a BREAK exception with a code field value to signal the problem to the system software. As an example, the C programming language in a UNIX environment expects division by zero to either terminate the program or execute a program-specified signal handler. C does not expect overflow to cause any exceptional condition. If the C compiler uses a divide instruction, it also emits code to test for a zero divisor and execute a BREAK instruction to inform the operating system if one is detected. In the C790, sign-extended 32-bit values (bits 63..31) are ignored on divide operation.
A-39
Appendix A CPU Instruction Set Details
DIVU
31 26 25 21 20
Divide Unsigned Word 16 15 65 0
DIVU
DIVU 011011
6
SPECIAL 000000
6
rs
5
rt
5
0 00 0000 0000
10
MIPS I
Format: Purpose: Description: DIVU rs, rt To divide 32-bit unsigned integers. (LO, HI) rs / rt
The 32-bit word value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands as unsigned values. The 32-bit quotient is placed into special register LO and the 32-bit remainder is placed into special register HI. No arithmetic exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined. If the divisor in GPR rt is zero, the arithmetic result is undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif q (0 || GPR[rs]31..0) div (0 || GPR[rt]31..0) LO63..0 sign_extend (q31..0) r (0 || GPR[rs]31..0) mod (0 || GPR[rt]31..0) HI63..0 sign_extend (r31..0)
Exceptions:
None
Programming Notes:
See the Programming Notes for the DIV instruction.
A-40
Appendix A CPU Instruction Set Details
DSLL
31 26 25 21 20
Doubleword Shift Left Logical 16 15 11 10 65 0
DSLL
DSLL 111000
6
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
sa
5
MIPS III
Format: Purpose: Description: DSLL rd, rt, sa To left shift a doubleword by a fixed amount 0 to 31 bits. rd rt << sa
The 64-bit doubleword contents of GPR rt are shifted left, inserting zeros into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 31 is specified by sa.
Restrictions:
None
Operation:
s 0 || sa GPR[rd] 63..0 GPR[rt](63-s)..0 || 0s
Exceptions:
None
Programming Notes:
None
A-41
Appendix A CPU Instruction Set Details
DSLL32
31 26 25
Doubleword Shift Left Logical Plus 32 21 20 16 15 11 10 65
DSLL32
0
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
sa
5
DSLL32 111100
6
MIPS III
Format: Purpose: Description: DSLL32 rd, rt, sa To left shift a doubleword by a fixed amount 32 to 63 bits. rd rt << (sa + 32)
The 64-bit doubleword contents of GPR rt are shifted left, inserting zeros into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 32 to 63 is specified by sa + 32.
Restrictions:
None
Operation:
s 1 || sa GPR[rd] 63..0 GPR[rt](63-s)..0 || 0s
Exceptions:
/* 32 + sa */
None
Programming Notes:
None
A-42
Appendix A CPU Instruction Set Details
DSLLV
31 26 25
Doubleword Shift Left Logical Variable 21 20 16 15 11 10 65
DSLLV
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
DSLLV 010100
6
MIPS III
Format: Purpose: Description: DSLLV rd, rt, rs To left shift a doubleword by a variable number of bits. rd rt << rs
The 64-bit doubleword contents of GPR rt are shifted left, inserting zeros into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 63 is specified by the low-order six bits in GPR rs.
Restrictions:
None
Operation:
s 0 || GPR[rs]5..0 GPR[rd] 63..0 GPR[rt](63-s)..0 || 0s
Exceptions:
None
Programming Notes:
None
A-43
Appendix A CPU Instruction Set Details
DSRA
31 26 25 21 20
Doubleword Shift Right Arithmetic 16 15 11 10 65 0
DSRA
DSRA 111011
6
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
sa
5
MIPS III
Format: Purpose: Description: DSRA rd, rt, sa To arithmetic right shift a doubleword by a fixed amount 0 to 31 bits. rd rt >> sa (arithmetic)
The 64-bit doubleword contents of GPR rt are shifted right, duplicating the sign bit (63) into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 31 is specified by sa.
Restrictions:
None
Operation:
s 0 || sa GPR[rd] 63..0 (GPR[rt]63)s || GPR[rt]63..s
Exceptions:
None
Programming Notes:
None
A-44
Appendix A CPU Instruction Set Details
DSRA32
31 26 25
Doubleword Shift Right Arithmetic Plus 32 21 20 16 15 11 10 65
DSRA32
0
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
sa
5
DSRA32 111111
6
MIPS III
Format: Purpose: Description: DSRA32 rd, rt, sa To arithmetic right shift a doubleword by a fixed amount 32-63 bits. rd rt >> (sa + 32) (arithmetic)
The doubleword contents of GPR rt are shifted right, duplicating the sign bit (63) into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 32 to 63 is specified by sa + 32.
Restrictions:
None
Operation:
s 1 || sa /* 32 + sa */ GPR[rd] 63..0 (GPR[rt]63)s || GPR[rt]63..s
Exceptions:
None
Programming Notes:
None
A-45
Appendix A CPU Instruction Set Details
DSRAV
31 26 25
Doubleword Shift Right Arithmetic Variable 21 20 16 15 11 10 65
DSRAV
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
DSRAV 010111
6
MIPS III
Format: Purpose: Description: DSRAV rd, rt, rs To arithmetic right shift a doubleword by a variable number of bits. rd rt >> rs (arithmetic)
The doubleword contents of GPR rt are shifted right, duplicating the sign bit (63) into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 63 is specified by the low-order six bits in GPR rs.
Restrictions:
None
Operation:
s GPR[rs]5..0 GPR[rd] 63..0 (GPR[rt]63)s || GPR[rt]63..s
Exceptions:
None
Programming Notes:
None
A-46
Appendix A CPU Instruction Set Details
DSRL
31 26 25 21 20
Doubleword Shift Right Logical 16 15 11 10 65 0
DSRL
DSRL 111010
6
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
sa
5
MIPS III
Format: Purpose: Description: DSRL rd, rt, sa To logical right shift a doubleword by a fixed amount 0 to 31 bits. rd rt >> sa (logical)
The doubleword contents of GPR rt are shifted right, inserting zeros into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 31 is specified by sa.
Restrictions:
None
Operation:
s 0 || sa GPR[rd] 63..0 0s || GPR[rt]63..s
Exceptions:
None
Programming Notes:
None
A-47
Appendix A CPU Instruction Set Details
DSRL32
31 26 25
Doubleword Shift Right Logical Plus 32 21 20 16 15 11 10 65
DSRL32
0
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
sa
5
DSRL32 111110
6
MIPS III
Format: Purpose: Description: DSRL32 rd, rt, sa To logical right shift a doubleword by a fixed amount 32 to 63 bits. rd rt >> (sa + 32) (logical)
The 64-bit doubleword contents of GPR rt are shifted right, inserting zeros into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 32 to 63 is specified by sa + 32.
Restrictions:
None
Operation:
s 1 || sa /* 32 + sa * / GPR[rd] 63..0 0s || GPR[rt]63..s
Exceptions:
None
Programming Notes:
None
A-48
Appendix A CPU Instruction Set Details
DSRLV
31 26 25
Doubleword Shift Right Logical Variable 21 20 16 15 11 10 65
DSRLV
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
DSRLV 010110
6
MIPS III
Format: Purpose: Description: DSRLV rd, rt, rs To logical right shift a doubleword by a variable number of bits. rd rt >> rs (logical)
The 64-bit doubleword contents of GPR rt are shifted right, inserting zeros into the emptied bits; the result is placed in GPR rd. The bit shift count in the range 0 to 63 is specified by the low-order six bits in GPR rs.
Restrictions:
None
Operation:
s GPR[rs]5..0 GPR[rd] 63..0 0s || GPR[rt]63..s
Exceptions:
None
Programming Notes:
None
A-49
Appendix A CPU Instruction Set Details
DSUB
31 26 25 21 20
Doubleword Subtract 16 15 11 10 65
DSUB
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
DSUB 101110
6
MIPS III
Format: Purpose: Description: DSUB rd, rs, rt To subtract 64-bit integers; trap if overflow. rd rs - rt
The 64-bit doubleword value in GPR rt is subtracted from the 64-bit value in GPR rs to produce a 64-bit result. If the subtraction results in 64-bit 2's complement arithmetic overflow then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 64-bit result is placed into GPR rd.
Restrictions:
None
Operation:
temp GPR[rs] 63..0 - GPR[rt] 63..0 if (64_bit_arithmetic_overflow) then SignalException (IntegerOverflow) else GPR[rd] 63..0 temp endif
Exceptions:
Integer Overflow
Programming Notes:
DSUBU performs the same arithmetic operation but, does not trap on overflow.
A-50
Appendix A CPU Instruction Set Details
DSUBU
31 26 25 21 20
Doubleword Subtract Unsigned 16 15 11 10 65
DSUBU
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
DSUBU 101111
6
MIPS III
Format: Purpose: Description: DSUBU rd, rs, rt To subtract 64-bit integers. rd rs - rt
The 64-bit doubleword value in GPR rt is subtracted from the 64-bit value in GPR rs and the 64-bit arithmetic result is placed into GPR rd. No Integer Overflow exception occurs under any circumstances.
Restrictions:
None
Operation:
GPR[rd] 63..0 GPR[rs] 63..0 - GPR[rt] 63..0
Exceptions:
None
Programming Notes:
The term "unsigned" in the instruction name is a misnomer; this operation is 64-bit modulo arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
A-51
Appendix A CPU Instruction Set Details
J
31 26 25
Jump 0
J
J 000010
6
instr_index
26
MIPS I
Format: Purpose: Description: J target To branch within the current 256 MB aligned region.
This is a PC-region branch (not PC-relative); the effective target address is in the "current" 256 MB aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the jump itself). not Jump to the effective target address. Execute the instruction following the jump, in the branch delay slot, before jumping.
Restrictions:
None
Operation:
: +1: PC PC31..28 || instr_index || 02
Exceptions:
None
Programming Notes:
Forming the branch target address by concatenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256 MB region aligned on a 256 MB boundary. It allows a branch to anywhere in the region from anywhere in the region which a signed relative offset would not allow. This definition creates the boundary case where the branch instruction is in the last word of a 256 MB region and can therefore only branch to the following 256 MB region containing the branch delay slot.
A-52
Appendix A CPU Instruction Set Details
JAL
31 26 25
Jump and Link 0
JAL
JAL 000011
6
instr_index
26
MIPS I
Format: Purpose: Description: JAL target To procedure call within the current 256 MB aligned region.
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution would continue after a procedure call. This is a PC-region branch (not PC-relative); the effective target address is in the "current" 256 MB aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the jump itself). not Jump to the effective target address. Execute the instruction following the jump, in the branch delay slot, before jumping.
Restrictions:
None
Operation:
: GPR[31] 63..0 zero_extend (PC + 8) +1: PC PC31..28 || instr_index || 02
Exceptions:
None
Programming Notes:
Forming the branch target address by concatenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256 MB region aligned on a 256 MB boundary. It allows a branch to anywhere in the region from anywhere in the region which a signed relative offset would not allow. This definition creates the boundary case where the branch instruction is in the last word of a 256 MB region and can therefore only branch to the following 256 MB region containing the branch delay slot.
A-53
Appendix A CPU Instruction Set Details
JALR
31 26 25 21 20
Jump and Link Register 16 15 11 10 65 0
JALR
JALR 001001
6
SPECIAL 000000
6
rs
5
0 00000
5
rd
5
0 00000
5
MIPS I
Format: JALR rs JALR rd, rs Purpose: Description: To procedure call to an instruction address in a register. rd return_addr, PC rs (rd = 31 implied)
Place the return address link in GPR rd. The return link is the address of the second instruction following the branch, where execution would continue after a procedure call. Jump to the effective target address in GPR rs. Execute the instruction following the jump, in the branch delay slot, before jumping.
Restrictions:
Register specifiers rs and rd must not be equal, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is undefined. This restriction permits an exception handler to resume execution by re-executing the branch when an exception occurs in the branch delay slot. The effective target address in GPR rs must be naturally aligned. If either of the two least-significant bits are not -zero, then an Address Error exception occurs, not for the jump instruction, but when the branch target is subsequently fetched as an instruction.
Operation:
temp GPR[rs] 31..0 GPR[rd] 63..0 zero_extend (PC + 8) +1: PC temp
Exceptions:
:
None
Programming Notes:
This is the only branch-and-link instruction that can select a register for the return link; all other link instructions use GPR 31 The default register for GPR rd, if omitted in the assembly language instruction, is GPR 31.
A-54
Appendix A CPU Instruction Set Details
JR
31 26 25 21 20
Jump Register 65 0
JR
JR 001000
6
SPECIAL 000000
6
rs
5
0 000 0000 0000 0000
15
MIPS I
Format: Purpose: Description: JR rs To branch to an instruction address in a register. PC rs
Jump to the effective target address in GPR rs. Execute the instruction following the jump, in the branch delay slot, before jumping.
Restrictions:
The effective target address in GPR rs must be naturally aligned. If either of the two least-significant bits are not-zero, then an Address Error exception occurs, not for the jump instruction, but when the branch target is subsequently fetched as an instruction.
Operation:
: temp GPR[rs] 31..0 +1: PC temp
Exceptions:
None
Programming Notes:
None
A-55
Appendix A CPU Instruction Set Details
LB
31 26 25 21 20
Load Byte 16 15 0
LB
offset
16
LB 100000
6
base
5
rt
5
MIPS I
Format: Purpose: Description: LB rt, offset (base) To load a byte from memory as a signed value. rt memory [base + offset]
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
None
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR[base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor BigEndian4) memquad LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) byte vAddr3..0 xor BigEndian4 GPR[rt]63..0 sign_extend (memquad (7+8*byte)..8*byte)
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-56
Appendix A CPU Instruction Set Details
LBU
31 26 25 21 20
Load Byte Unsigned 16 15 0
LBU
offset
16
LBU 100100
6
base
5
rt
5
MIPS I
Format: Purpose: Description: LBU rt, offset (base) To load a byte from memory as an unsigned value. rt memory [base + offset]
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
None
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR[base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1).. 4 || (pAddr3..0 xor BigEndian4) memquad LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) byte vAddr3..0 xor BigEndian4 GPR[rt]63..0 zero_extend (memquad(7+8*byte)..8*byte)
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-57
Appendix A CPU Instruction Set Details
LD
31 26 25 21 20
Load Doubleword 16 15 0
LD
offset
16
LD 110111
6
base
5
rt
5
MIPS III
Format: Purpose: Description: LD rt, offset (base) To load a doubleword from memory. rt memory [base + offset]
The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If any of the three least-significant bits of the effective address are non-zero, an Address Error exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 if (vAddr2..0) 03 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1).. 4 || (pAddr3..0 xor (BigEndian || 03)) byte vAddr3..0 || (BigEndian || 03) memquad LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) GPR[rt]63..0 memquad(63+8*byte)..8*byte
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-58
Appendix A CPU Instruction Set Details
LDL
31 26 25 21 20
Load Doubleword Left 16 15 0
LDL
offset
16
LDL 011010
6
base
5
rt
5
MIPS III
Format: Purpose: Description: LDL rt, offset (base) To load the more-significant part of a doubleword from an unaligned memory address. rt rt MERGE memory [base + offset]
Paired LDL and LDR instructions are used to load a register with a doubleword from eight consecutive bytes in memory starting at an arbitrary byte address. LDL loads the left (most-significant) bytes and LDR loads the right (least-significant) bytes. The instruction adds the 16-bit signed offset to the contents of GPR base to form the effective address. This is the address of the most-significant byte of a doubleword composed of eight consecutive bytes in memory. LDL loads from one to eight bytes, the most-significant bytes of the doubleword, into the corresponding bytes of GPR rt. It loads the bytes that are in the target doubleword that are also in the aligned doubleword which contains the byte specified by the effective address. Conceptually, it starts at the specified byte in memory and loads that byte into the highorder (left-most) byte of the register; then it loads bytes from memory into the register until it reaches the low-order byte of the doubleword in memory. The least-significant (right-most) byte (s) of the register will not be changed.
memory (little-endian) address 8 address 0 15 14 13 12 11 10 9 7 6 5 4 3 2 1 8 0 before H G F E D C B A $24
register
LDL $24,11 ($0) after 11 10 9
register 8 D C B A $24
memory (big-endian) address 8 address 0 8 0 9 10 11 12 13 14 15 1 2 3 4 5 6 7 before A B C
register D E F G H $24
LDL $24,3 ($0) after 3 4 5
register 6 7 F G H $24
The contents of GPR rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LDL (or LDR) instruction which also specifies register rt. A-59
Appendix A CPU Instruction Set Details No address exceptions due to alignment are possible.
Restrictions:
None
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR[base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor BigEndian4) if (BigEndian = 0) then pAddr pAddr(PSIZE-1)..3 || 03 endif byte 0 || (vAddr2..0 xor BigEndian3) doubleword vAddr3 xor BigEndian memquad LoadMemory (uncached, byte, pAddr, vAddr, DATA) GPR[rt]63..0 memquad(7+8*byte+64*doubleword)..(64*doubleword) || GPR[rt] (55-8*byte)..0
Given a doubleword in a register and a doubleword in memory, the operation of LDL is as follows:
A-60
Appendix A CPU Instruction Set Details
LDL
MSB 63 Register Little-endian Memory 15 I 14 J 13 K a 12 L b 11 M c 10 N d 9 O e 8 P f 7 Q g 6 R h 5 S 4 T 3 U 2 V 1 W 0 X 0 LSB
Little-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X W V U T S R Q P O N M L K J I b X W V U T S R b P O N M L K J c c X W V U T S c c P O N M L K d d d X W V U T d d d P O N M L e e e e X W V U e e e e P O N M f f f f f X W V f f f f f P O N g g g g g g X W g g g g g g P O h h h h h h h X h h h h h h h P 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Type LEM 0 0 0 0 0 0 0 0 8 8 8 8 8 8 8 8 offset BEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
A-61
Appendix A CPU Instruction Set Details
LDL
MSB 63 Register Big-endian Memory Little-endian 0 I 15 1 J 14 2 K 13 a 3 L 12 b 4 M 11 c 5 N 10 d 6 O 9 e 7 P 8 f 8 Q 7 g 9 R 6 h 10 S 5 11 T 4 12 U 3 13 V 2 14 W 1 15 X 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 I J K L M N O P Q R S T U V W X J K L M N O P b R S T U V W X b K L M N O P c c S T U V W X c c L M N O P d d d T U V W X d d d M N O P e e e e U V W X e e e e N O P f f f f f V W X f f f f f O P g g g g g g W X g g g g g g P h h h h h h h X h h h h h h h 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 Type LEM 0 0 0 0 0 0 0 0 8 8 8 8 8 8 8 8 offset BEM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndian = 0) BigEndian = 1 AccessLength sent to memory pAddr3..0 sent to memory
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-62
Appendix A CPU Instruction Set Details
LDR
31 26 25 21 20
Load Doubleword Right 16 15 0
LDR
LDR 011011
6
base
5
rt
5
offset
16
MIPS III
Format: Purpose: Description: LDR rt, offset (base) To load the less-significant part of a doubleword from an unaligned memory address. rt rt MERGE memory [base + offset]
Paired LDL and LDR instructions are used to load a register with a doubleword from eight consecutive bytes in memory starting at an arbitrary byte address. LDL loads the left (most-significant) bytes and LDR loads the right (least-significant) bytes. The instruction adds the 16-bit signed offset to the contents of GPR base to form the effective address. This is the address of the least-significant bytes of a doubleword composed of eight consecutive bytes in memory. LDR loads from one to eight bytes, the least-significant bytes of the doubleword, into the corresponding bytes of GPR rt. It loads the bytes that are in the target doubleword that are also in the aligned doubleword which contains the byte specified by the effective address. Conceptually, it starts at the specified byte in memory and loads that byte into the loworder (right-most) byte of the register; then it loads bytes from memory into the register until it reaches the high-order byte of the doubleword in memory. The most significant (left-most) byte (s) of the register will not be changed.
memory (little-endian) address 8 address 0 15 14 13 12 11 10 9 7 6 5 4 3 2 1 8 0 before H G F E D C B A $24
register
LDR $24,4 ($0) after H G F
register E 7 6 5 4 $24
memory (big-endian) address 8 address 0 8 0 9 10 11 12 13 14 15 1 2 3 4 5 6 7 before A B C
register D E F G H $24
LDR $24,4 ($0) after A B C
register 0 1 2 3 4 $24
The contents of GPR rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LDR (or LDL) instruction which also specifies register rt. A-63
Appendix A CPU Instruction Set Details No address exceptions due to alignment are possible.
Restrictions:
None
Operation: (128-bit bus)
vAddr sign_extend(offset) + GPR[base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1)..0 || (pAddr3..0 xor BigEndian4) if (BigEndian = 1) then pAddr pAddr(PSIZE-1)..3 || 03 endif byte 0 || (vAddr2..0 xor BigEndian3) doubleword vAddr3 xor BigEndian memquad LoadMemory (uncached, byte, pAddr, vAddr, DATA) GPR[rt]63..0 GPR[rt] 63..(64-8*byte) || memquad(63+64*doubleword).. (64*doubleword+8*byte)
Given a doubleword in a register and a doubleword in memory, the operation of LDR is as follows:
A-64
Appendix A CPU Instruction Set Details
LDR
MSB 63 Register Little-endian Memory 15 I 14 J 13 K a 12 L b 11 M c 10 N d 9 O e 8 P f 7 Q g 6 R h 5 S 4 T 3 U 2 V 1 W 0 X 0 LSB
Little-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q a a a a a a a I a a a a a a a R Q b b b b b b J I b b b b b b S R Q c c c c c K J I c c c c c T S R Q d d d d L K J I d d d d U T S R Q e e e M L K J I e e e V U T S R Q f f N M L K J I f f W V U T S R Q g O N M L K J I g X W V U T S R Q P O N M L K J I 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 Type LEM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 offset BEM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A-65
Appendix A CPU Instruction Set Details
LDR
MSB 63 Register Big-endian Memory Little-endian 0 I 15 1 J 14 2 K 13 a 3 L 12 b 4 M 11 c 5 N 10 d 6 O 9 e 7 P 8 f 8 Q 7 g 9 R 6 h 10 S 5 11 T 4 12 U 3 13 V 2 14 W 1 15 X 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 1) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a a a a a a a I a a a a a a a Q b b b b b b I J b b b b b b Q R c c c c c I J K c c c c c Q R S d d d d I J K L d d d d Q R S T e e e I J K L M e e e Q R S T U f f I J K L M N f f Q R S T U V g I J K L M N O g Q R S T U V W I J K L M N O P Q R S T U V W X 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Type LEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 offset BEM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndianMem = 0) BigEndianMem = 1 AccessLength sent to memory pAddr2..0 sent to memory
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-66
Appendix A CPU Instruction Set Details
LH
31 26 25 21 20
Load Halfword 16 15 0
LH
offset
16
LH 100001
6
base
5
rt
5
MIPS I
Format: Purpose: Description: LH rt, offset (base) To load a halfword from memory as a signed value. rt memory [base + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR[base] 31..0 if (vAddr0) 0 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor (BigEndian3 || 0)) memquad LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) byte vAddr3..0 xor (BigEndian3 || 0) GPR[rt]63..0 sign_extend (memquad(15+8*byte)..8*byte)
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-67
Appendix A CPU Instruction Set Details
LHU
31 26 25 21 20
Load Halfword Unsigned 16 15 0
LHU
LHU 100101
6
base
5
rt
5
offset
16
MIPS I
Format: Purpose: Description: LHU rt, offset (base) To load a halfword from memory as an unsigned value. rt memory [base + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 if (vAddr0) 0 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor (BigEndian3 || 0)) memquad LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) byte vAddr3..0 xor (BigEndian3 || 0) GPR [rt]63..0 zero_extend (memquad(15+8*byte)..8*byte)
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-68
Appendix A CPU Instruction Set Details
LUI
31 26 25 21 20
Load Upper Immediate 16 15 0
LUI
LUI 001111
6
0 00000
5
rt
5
immediate
16
MIPS I
Format: Purpose: Description: LUI rt, immediate To load a constant into the upper half of a word. rt immediate || 0
16
The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. The 32-bit result is sign-extended and placed into GPR rt.
Restrictions:
None
Operation:
GPR [rt] 63..0 sign_extend (immediate || 016)
Exceptions:
None
Programming Notes:
None
A-69
Appendix A CPU Instruction Set Details
LW
31 26 25 21 20
Load Word 16 15 0
LW
offset
16
LW 100011
6
base
5
rt
5
MIPS I
Format: Purpose: Description: LW rt, offset (base) To load a word from memory as a signed value. rt memory [base + offset]
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign-extended to the GPR register length if necessary, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If either of the two least-significant bits of the address are non-zero, an Address Error exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 if (vAddr1..0) 02 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor (BigEndian2 || 02)) memquad LoadMemory (uncached, WORD, pAddr, vAddr, DATA) byte vAddr3..0 xor (BigEndian2 || 02) GPR [rt] 63..0 sign_extend (memquad(31+8*byte)..8*byte)
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-70
Appendix A CPU Instruction Set Details
LWL
31 26 25 21 20
Load Word Left 16 15 0
LWL
offset
16
LWL 100010
6
base
5
rt
5
MIPS I
Format: Purpose: Description: LWL rt, offset (base) To load the more-significant part of a word from an unaligned memory address as a signed value. rt rt MERGE memory [base + offset]
Paired LWL and LWR instructions are used to load a register with a word from four consecutive bytes in memory starting at an arbitrary byte address. LWL loads the left (most-significant) bytes and LWR loads the right (least-significant) bytes. The instruction adds the 16-bit signed offset to the contents of GPR base to form the effective address. This is the address of the most-significant byte of a word composed of four consecutive bytes in memory. LWL loads from one to four bytes, the most-significant bytes of the word, into the corresponding bytes of GPR rt. It loads the bytes that are in the target word that are also in the aligned word which contains the byte specified by the effective address. Bit 31 of the register is loaded so the loaded word is sign-extended. Conceptually, it starts at the specified byte in memory and loads that byte into the highorder (left-most) byte of the register; then it loads bytes from memory into the register until it reaches the low-order byte of the word in memory. The least-significant (rightmost) byte(s) of the register will not be changed.
memory (little-endian) register address 4 address 0 7 3 6 2 5 1 4 0 before D C B A $24
LWL $24,4 ($0) after 4
register C B A $24
memory (big-endian) register address 4 address 0 4 0 5 1 6 2 7 3 before a b c d $24
LWL $24,1 ($0) after 1
register 2 3 d $24
A-71
Appendix A CPU Instruction Set Details The contents of GPR rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LWL (or LWR) instruction which also specifies register rt. No address exceptions due to alignment are possible.
Restrictions:
None
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor BigEndian4) if (BigEndian = 0) then pAddr(PSIZE-1)..3 || 03 endif byte 02 || (vAddr1..0 xor BigEndian2) word vAddr3..2 xor BigEndian2 memquad LoadMemory (uncached, byte, pAddr, vAddr, DATA) temp memquad(32*word+8*byte+7)..32*word || GPR [rt] (23-8*byte)..0 GPR [rt] 63..0 (temp31)32 || temp
Given a doubleword in a register and a doubleword in memory, the operation of LWL is as follows:
A-72
Appendix A CPU Instruction Set Details
LWL
MSB 63 Register Little-endian Memory 15 I 14 J 13 K a 12 L b 11 M c 10 N d 9 O e 8 P f 7 Q g 6 R h 5 S 4 T 3 U 2 V 1 W 0 X 0 LSB
Little-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended X W V U T S R Q P O N M L K J I f X W V f T S R f P O N f L K J g g X W g g T S g g P O g g L K h h h X h h h T h h h P h h h L 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Type LEM 0 0 0 0 4 4 4 4 8 8 8 8 12 12 12 12 offset BEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
A-73
Appendix A CPU Instruction Set Details
LWL
MSB 63 Register Big-endian Memory Little-endian 0 I 15 1 J 14 2 K 13 a 3 L 12 b 4 M 11 c 5 N 10 d 6 O 9 e 7 P 8 f 8 Q 7 g 9 R 6 h 10 S 5 11 T 4 12 U 3 13 V 2 14 W 1 15 X 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 1) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended Sign bit(31) extended I J K L M N O P Q R S T U V W X J K L f N O P f R S T f V W X f K L g g O P g g S T g g W X g g L h h h P h h h T h h h X h h h 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 Type LEM 12 12 12 12 8 8 8 8 4 4 4 4 0 0 0 0 offset BEM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndianMem = 0) BigEndianMem = 1 AccessLength sent to memory pAddr2..0 sent to memory
TLB Refill TLB Invalid Address Error
Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, i.e. zeroing bits 63..32 of the destination register when bit 31 is loaded. See SLL or SLLV for a single-instruction method of propagating the word sign bit in a register into the upper half of a 64-bit register.
A-74
Appendix A CPU Instruction Set Details
LWR
31 26 25 21 20
Load Word Right 16 15 0
LWR
offset
16
LWR 100110
6
base
5 5
rt
MIPS I
Format: Purpose: Description: LWR rt, offset (base) To load the less-significant part of a word from an unaligned memory address as a signed value. rt rt MERGE memory [base + offset]
Paired LWL and LWR instructions are used to load a register with a word from four consecutive bytes in memory starting at an arbitrary byte address. LWL loads the left (most-significant) bytes and LWR loads the right (least-significant) bytes. The instruction adds the 16-bit signed offset to the contents of GPR base to form the effective address. This is the address of the least-significant byte of a word composed of four consecutive bytes in memory. LWR loads from one to four bytes, the least-significant bytes of the word, into the corresponding bytes of GPR rt. It loads the bytes that are in the target word that are also in the aligned word which contains the byte specified by the effective address. If the word sign bit (bit 31) is loaded from memory into the register by the instruction, then the loaded word is sign-extended. If the sign bit is not loaded from memory by the LWR, then bits 63..32 of the destination are unchanged. Conceptually, it starts at the specified byte in memory and loads that byte into the loworder (right-most) byte of the register; then it loads bytes from memory into the register until it reaches the high-order byte of the word in memory. The most significant (leftmost) byte(s) of the register will not be changed.
memory (little-endian) register address 4 address 0 7 3 6 2 5 1 4 0 before D C B A $24
LWR $24,1 ($0) after D
register 3 2 1 $24
A-75
Appendix A CPU Instruction Set Details
memory (big-endian) register address 4 address 0 4 0 5 1 6 2 7 3 before A B C D $24
LWR $24,4 ($0) after A
register B C 4 $24
The contents of GPR rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LWR (or LWL) instruction which also specifies register rt. No address exceptions due to alignment are possible.
Restrictions:
None
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base]31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1).. 4 || (pAddr3..0 xor BigEndian4) if (BigEndian = 1) then pAddr(PSIZE-31)..3 || 03 endif byte 0 || (vAddr1..0 xor BigEndian2) word vAddr3..2 xor BigEndian2 memquad LoadMemory (uncached, byte, pAddr, vAddr, DATA) temp GPR [rt]31.. (32-8*byte) || memquad(31+32*word).. (32*word+8*byte) if (byte = 4) then utemp (temp31)32 /* loaded bit 31, must sign extend */ else one of the following two behaviors: utemp GPR [rt]63..32 /* leave what was there alone */ utemp (GPR [rt]31)32 /* sign-extend bit 31 */ endif GPR [rt] 63..0 utemp || temp Given a word in a register and a word in memory, the operation of LWR is as follows:
A-76
Appendix A CPU Instruction Set Details
LWR
MSB 63 Register Little-endian Memory 15 I 14 J 13 K a 12 L b 11 M c 10 N d 9 O e 8 P f 7 Q g 6 R h 5 S 4 T 3 U 2 V 1 W 0 X 0 LSB
Little-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sign bit (31) extended Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged e e e I e e e M e e e Q e e e U f f I J f f M N f f Q R f f U V g I J K g M N O g Q R S g U V W I J K L M N O P Q R S T U V W X 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Type LEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 offset BEM 0 0 0 0 4 4 4 4 8 8 8 8 12 12 12 12
A-77
Appendix A CPU Instruction Set Details
LWR
MSB 63 Register Big-endian Memory Little-endian 0 I 15 1 J 14 2 K 13 a 3 L 12 b 4 M 11 c 5 N 10 d 6 O 9 e 7 P 8 f 8 Q 7 g 9 R 6 h 10 S 5 11 T 4 12 U 3 13 V 2 14 W 1 15 X 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 1) vAddr3..0 Destination register contents after instruction(shaded is unchanged) (63----------------------------------------32 31------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended or unchanged Sign bit (31) extended e e e I e e e M e e e Q e e e U f f I J f f M N f f Q R f f U V g I J K g M N O g Q R S g U V W I J K L M N O P Q R S T U V W X 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Type LEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 offset BEM 0 0 0 0 4 4 4 4 8 8 8 8 12 12 12 12
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndian = 0) BigEndianMem = 1 AccessLength sent to memory pAddr2..0 sent to memory
TLB Refill TLB Invalid Address Error
Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, i.e. zeroing bits 63..32 of the destination register when bit 31 is loaded. See SLL or SLLV for a single-instruction method of propagating the word sign bit in a register into the upper half of a 64-bit register.
A-78
Appendix A CPU Instruction Set Details
LWU
31 26 25 21 20
Load Word Unsigned 16 15 0
LWU
offset
16
LWU 100111
6
base
5
rt
5
MIPS III
Format: Purpose: Description: LWU rt, offset (base) To load a word from memory as an unsigned value. rt memory [base + offset]
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If either of the two least-significant bits of the address are non-zero, an Address Error Exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 if (vAddr1..0) 02 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr(PSIZE-1).. 4 || (pAddr3..0 xor (BigEndian2 || 02)) memquad LoadMemory (uncached, WORD, pAddr, vAddr, DATA) byte vAddr3..0 xor (BigEndian2 || 02) GPR [rt] 63..0 032 || memquad(31+8*byte)..8*byte
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
A-79
Appendix A CPU Instruction Set Details
MFHI
31 26 25
Move from HI Register 16 15 11 10 65 0
MFHI
MFHI 010000
6
SPECIAL 000000
6
0 00 0000 0000
10
rd
5
0 00000
5
MIPS I
Format: Purpose: Description: MFHI rd To copy the special purpose HI register to a GPR. rd HI
The contents of special register HI are loaded into GPR rd.
Restrictions:
None
Operation:
GPR [rd]63..0 HI63..0
Exceptions:
None
Programming Notes:
No restriction is needed because C790 has an interlock mechanism for MULT or DIV instructions.
A-80
Appendix A CPU Instruction Set Details
MFLO
31 26 25
Move from LO Register 16 15 11 10 65 0
MFLO
MFLO 010010
6
SPECIAL 000000
6
0 00 0000 0000
10
rd
5
0 00000
5
MIPS I
Format: Purpose: Description: MFLO rd To copy the special purpose LO register to a GPR. rd LO
The contents of special register LO are loaded into GPR rd.
Restrictions:
None
Operation:
GPR [rd] 63..0 LO63..0
Exceptions:
None
Programming Notes:
(Same as MFHI)
A-81
Appendix A CPU Instruction Set Details
MOVN
31 26 25 21 20
Move Conditional on Not Zero 16 15 11 10 65
MOVN
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
MOVN 001011
6
MIPS IV
Format: Purpose: Description: MOVN rd, rs, rt To conditionally move a GPR after testing a GPR value. if (rt 0) then rd rs
If the value in GPR rt is not equal to zero, then the contents of GPR rs are placed into GPR rd.
Restrictions:
None
Operation:
if GPR [rt] 63..0 0 then GPR [rd] 63..0 GPR [rs] 63..0 endif
Exceptions:
None
Programming Notes:
The nonzero value tested here is the "condition true" result from the SLT, SLTI, SLTU, and SLTIU comparison instructions.
A-82
Appendix A CPU Instruction Set Details
MOVZ
31 26 25 21 20
Move Conditional on Zero 16 15 11 10 65
MOVZ
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
MOVZ 001010
6
MIPS IV
Format: Purpose: Description: MOVZ rd, rs, rt To conditionally move a GPR after testing a GPR value. if (rt = 0) then rd rs
If the value in GPR rt is equal to zero, then the contents of GPR rs are placed into GPR rd.
Restrictions:
None
Operation:
if GPR [rt] 63..0 = 0 then GPR [rd] 63..0 GPR [rs] 63..0 endif
Exceptions:
None
Programming Notes:
The zero value tested here is the "condition false" result from the SLT, SLTI, SLTU, and SLTIU comparison instructions.
A-83
Appendix A CPU Instruction Set Details
MTHI
31 26 25 21 20
Move to HI Register 65 0
MTHI
MTHI 010001
6
SPECIAL 000000
6
rs
5
0 000 0000 0000 0000
15
MIPS I
Format: Purpose: Description: MTHI rs To copy a GPR to the special purpose HI register. HI rs
The contents of GPR rs are loaded into special register HI.
Restrictions:
None
Operation:
HI63..0 GPR [rs] 63..0
Exceptions:
None
Programming Notes:
None
A-84
Appendix A CPU Instruction Set Details
MTLO
31 26 25 21 20
Move to LO Register 65
MTLO
0
SPECIAL 000000
6
rs
5
0 000 0000 0000 0000
15
MTLO 010011
6
MIPS I
Format: Purpose: Description: MTLO rs To copy a GPR to the special purpose LO register. LO rs
The contents of GPR rs are loaded into special register LO.
Restrictions:
None
Operation:
LO63..0 GPR [rs] 63..0
Exceptions:
None
Programming Notes:
None
A-85
Appendix A CPU Instruction Set Details
MULT
31 26 25 21 20
Multiply Word 16 15 65 0
MULT
MULT 011000
6
SPECIAL 000000
6
rs
5
rt
5
0 00 0000 0000
10
MIPS I
Format: Purpose: Description: MULT rs, rt To multiply 32-bit signed integers. (LO, HI) rs x rt
The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as signed values, to produce a 64-bit result. The low-order 32-bit word of the result is placed into special register LO, and the high-order 32-bit word is placed into special register HI. No arithmetic exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR [rs]) or NotWordValue (GPR [rt])) then UndefinedResult() endif prod GPR [rs]31..0 * GPR [rt]31..0 LO63..0 (prod 31)32 || prod31..0 HI63..0 (prod 63)32 || prod63..32
Exceptions:
None
Programming Notes:
In the C790, the integer multiply operation proceeds asynchronously and allows other CPU instructions to execute before it is retired. An attempt to read LO or HI before the results are written will wait (interlock) until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel. Programs that require overflow detection must check for it explicitly.
A-86
Appendix A CPU Instruction Set Details
MULTU
31 26 25 21 20
Multiply Unsigned Word 16 15 65
MULTU
0
SPECIAL 000000
6
rs
5
rt
5
0 00 0000 0000
10
MULTU 011001
6
MIPS I
Format: Purpose: Description: MULTU rs, rt To multiply 32-bit unsigned integers. (LO, HI) rs x rt
The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as unsigned values, to produce a 64-bit result. The low-order 32-bit word of the result is placed into special register LO, and the high-order 32-bit word is placed into special register HI. No arithmetic exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR [rs]) or NotWordValue (GPR [rt])) then UndefinedResult() endif prod (0 || GPR [rs]31..0 ) * (0 || GPR [rt]31..0) LO63..0 (prod 31)32 || prod31..0 HI63..0 (prod 63)32 || prod63..32
Exceptions:
None
Programming Notes:
See the Programming Notes for the MULT instruction.
A-87
Appendix A CPU Instruction Set Details
NOR
31 26 25 21 20
Not Or 16 15 11 10 65 0
NOR
0 00000
5
SPECIAL 000000
6
rs
5
rt
5
rd
5
NOR 100111
6
MIPS I
Format: Purpose: Description: NOR rd, rs, rt To do a bitwise logical NOT OR. rd rs NOR rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical NOR operation. The result is placed into GPR rd.
Restrictions:
None
Operation:
GPR [rd] 63..0 GPR [rs] 63..0 nor GPR [rt] 63..0
Exceptions:
None
Programming Notes:
None
A-88
Appendix A CPU Instruction Set Details
OR
31 26 25 21 20 16 15
Or 11 10 65 0
OR
0 00000
5
SPECIAL 000000
6
rs
5
rt
5
rd
5
OR 100101
6
MIPS I
Format: Purpose: Description: OR rd, rs, rt To do a bitwise logical OR. rd rs OR rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical OR operation. The result is placed into GPR rd.
Restrictions:
None
Operation:
GPR [rd] 63..0 GPR [rs] 63..0 or GPR [rt] 63..0
Exceptions:
None
Programming Notes:
None
A-89
Appendix A CPU Instruction Set Details
ORI
31 26 25 21 20
Or Immediate 16 15 0
ORI
immediate
16
ORI 001101
6
rs
5
rt
5
MIPS I
Format: Purpose: Description: ORI rt, rs, immediate To do a bitwise logical OR with a constant. rt rs OR immediate
The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical OR operation. The result is placed into GPR rt.
Restrictions:
None
Operation:
GPR [rt] 63..0 zero_extend (immediate) or GPR [rs] 63..0
Exceptions:
None
Programming Notes:
None
A-90
Appendix A CPU Instruction Set Details
PREF
31 26 25 21 20
Prefetch 16 15 0
PREF
offset
16
PREF 110011
6
base
5
hint
5
MIPS IV
Format: Purpose: Description: PREF hint, offset (base) To prefetch data from memory. prefetch_memory (base+offset)
PREF adds the 16-bit signed offset to the contents of GPR base to form an effective byte address. It advises that data at the effective address may be used in the near future. If the hint field is 000002, this instruction prefetches a block of data from main memory into cache. PREF is an advisory instruction. It may change the performance of the program. For all hint values and all effective addresses, it neither changes architecturally-visible state nor alters the meaning of the program. PREF does not cause addressing-related exceptions. If it raises an exception condition, the exception conditions ignored. If an addressing-related exception condition is raised and ignored, no data will be prefetched, Even if no data is prefetched in such a case, some action that is not architecturally-visible, such as writeback of a dirty cache line, might take place. PREF will never generate a memory operation for a location with an uncached memory access type. The defined hint values are shown in the table below. The C790 only supports hint = 0. The hint table may be extended in future implementations.
Values of hint field for prefetch instruction
Value 0 Name load Data use and desired prefetch action Data is expected to be loaded (not modified). Fetch data as if for a load. 1-31 (Reserved) (Reserved)
A-91
Appendix A CPU Instruction Set Details
Restrictions:
None
Operation:
vAddr sign_extend (offset) + GPR [base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) Prefetch (uncached, pAddr, vAddr, DATA, hint)
Exceptions:
None
Programming Notes:
Prefetch can not prefetch data from a mapped location unless the translation for that location is present in the TLB. Locations in memory pages that have not been accessed recently may not have translations in the TLB, so prefetch may not be effective for such locations. Prefetch on C790 may not prefetch data when there is outstanding bus read process due to a data cache miss, an uncached load or a miss on the uncached accelerated buffer. Prefetch does not cause addressing exceptions. It will not cause an exception to prefetch using an address pointer value before the validity of a pointer determined.
Implementation Notes:
A reserved hint field value causes a default prefetch action, the load hint.
A-92
Appendix A CPU Instruction Set Details
SB
31 26 25 21 20
Store Byte 16 15 0
SB
offset
16
SB 101000
6
base
5
rt
5
MIPS I
Format: Purpose: Description: SB rt, offset (base) To store a byte to memory. memory [base + offset] rt
The least-significant 8-bit byte of GPR rt is stored in memory at the location specified by the effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
None
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1).. 4 || (pAddr3..0 xor BigEndian4) byte vAddr3..0 xor BigEndian4 dataquad GPR [rt] (127-8*byte)..0 || 08*byte StoreMemory (uncached, BYTE, dataquad, pAddr, vAddr, DATA)
Exceptions:
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-93
Appendix A CPU Instruction Set Details
SD
31 26 25 21 20
Store Doubleword 16 15 0
SD
offset
16
SD 111111
6
base
5 5
rt
MIPS III
Format: Purpose: Description: SD rt, offset (base) To store a doubleword to memory. memory [base + offset] rt
The 64-bit doubleword in GPR rt is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If any of the three least-significant bits of the effective address are non-zero, an Address Error exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 if (vAddr2..0) 03 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1).. 4 || (pAddr3..0 xor (BigEndian || 03)) byte vAddr3..0 || (BigEndian || 03) dataquad GPR [rt] (127-8*byte)..0 || 08*byte StoreMemory (uncached, DOUBLEWORD, dataquad, pAddr, vAddr, DATA)
Exceptions:
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-94
Appendix A CPU Instruction Set Details
SDL
31 26 25 21 20
Store Doubleword Left 16 15 0
SDL
offset
16
SDL 101100
6
base
5
rt
5
MIPS III
Format: SDL rt, offset (base) Purpose: To store the more-significant part of a doubleword to an unaligned memory address. Description: memory [base + offset] rt
Paired SDL and SDR instructions are used to store a doubleword from a register into eight consecutive bytes in memory starting at an arbitrary byte address. SDL stores the left (most-significant) bytes and SDR stores the right (least-significant) bytes. The 16-bit signed offset is added to the contents of GPR base to form the effective address of the most-significant byte of the contiguous doubleword in memory. It alters only the doubleword in memory which contains that byte. From one to eight bytes will be stored, depending on the starting byte specified. Conceptually, it starts at the most-significant byte of the register and copies it to the specified byte in memory; then it copies bytes from register to memory until it reaches the low-order byte of the word in memory. No address exceptions due to alignment are possible.
memory (little-endian) address 8 address 0 15 14 13 12 11 10 9 7 6 5 4 3 2 1 8 0 before H G F E D C B A $24 SDL $24,10 ($0) address 8 address 0 15 14 13 12 11 H 7 6 5 4 3 2 G 1 F 0 after
register
A-95
Appendix A CPU Instruction Set Details
memory (little-endian) address 8 address 0 8 0 9 10 11 12 13 14 15 1 2 3 4 5 6 7 before A B C
register D E F G H $24
SDL $24,1 ($0) address 8 address 0 Restrictions: 8 0 9 10 11 12 13 14 15 A B C D E F G after
None
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor BigEndian4) If (BigEndian = 0) then pAddr pAddr(PSIZE-1)..3 || 03 endif byte 0 || (vAddr2..0 xor BigEndian3) if (vAddr3 xor BigEndian = 0) then dataquad 064 || 0(56-8*byte) || GPR [rt] 63.. (56-8*byte) else dataquad 0(56-8*byte) || GPR [rt]63.. (56-8*byte) || 064 endif StoreMemory (uncached, byte, dataquad, pAddr, vAddr, DATA)
Given a doubleword in a register and a doubleword in memory, the operation of SDL is as follows:
A-96
Appendix A CPU Instruction Set Details
SDL
MSB 63 Register Little-endian Memory 15 i 14 j 13 k A 12 l B 11 m C 10 n D 9 o E 8 p F 7 q G 6 r H 5 s 4 t 3 u 2 v 1 w 0 x 0 LSB
Little-endian byte ordering (BigEndianCPU = 1) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 I I I I I I I I I I I I I I I A j j j j j j j j j j j j j j A B k k k k k k k k k k k k k A B C l l l l l l l l l l l l A B C D m m m m m m m m m m m A B C D E n n n n n n n n n n A B C D E F o o o o o o o o o A B C D E F G p p p p p p p p A B C D E F G H q q q q q q q A q q q q q q q q r r r r r r A B r r r r r r r r s s s s s A B C s s s s s s s s t t t t A B C D t t t t t t t t u u u A B C D E u u u u u u u u v v A B C D E F v v v v v v v v w A B C D E F G w w w w w w w w A B C D E F G H x x x x x x x x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Type LEM 8 8 8 8 8 8 8 8 0 0 0 0 0 0 0 0 offset BEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
A-97
Appendix A CPU Instruction Set Details
SDL
MSB 63 Register Big-endian Memory Little-endian 0 i 15 1 j 14 2 k 13 A 3 l 12 B 4 m 11 C 5 n 10 D 6 o 9 E 7 p 8 F 8 q 7 G 9 r 6 H 10 s 5 11 t 4 12 u 3 13 v 2 14 w 1 15 x 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A i i i i i i i i i i i i i i i B A j j j j j j j j j j j j j j C B A k k k k k k k k k k k k k D C B A l l l l l l l l l l l l E D C B A m m m m m m m m m m m F E D C B A n n n n n n n n n n G F E D C B A o o o o o o o o o H G F E D C B A p p p p p p p p q q q q q q q q A q q q q q q q r r r r r r r r B A r r r r r r s s s s s s s s C B A s s s s s t t t t t t t t D C B A t t t t u u u u u u u u E D C B A u u u v v v v v v v v F E D C B A v v w w w w w w w w G F E D C B A w x x x x x x x x H G F E D C B A 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Type LEM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 offset BEM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndianMem = 0) BigEndianMem = 1 AccessLength sent to memory pAddr3..0 sent to memory
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-98
Appendix A CPU Instruction Set Details
SDR
31 26 25 21 20
Store Doubleword Right 16 15 0
SDR
SDR 101101
6
base
5
rt
5
offset
16
MIPS III
Format: Purpose: Description: SDR rt, offset (base) To store the less-significant part of a doubleword to an unaligned memory address. memory [base + offset] rt
Paired SDL and SDR instructions are used to store a doubleword from a register into eight consecutive bytes in memory starting at an arbitrary byte address. SDL stores the left (most-significant) bytes and SDR stores the right (least-significant) bytes. The SDR instruction adds its sign-extended 16-bit offset to the contents of GPR base to form an effective address which may specify an arbitrary byte. It alters only the doubleword in memory which contains that byte. From one to eight bytes will be stored, depending on the starting byte specified. Conceptually, it starts at the least-significant (rightmost) byte of the register and copies it to the specified byte in memory; then it copies bytes from register to memory until it reaches the high-order byte of the word in memory. No address exceptions due to alignment are possible.
memory (little-endian) address 8 address 0 15 14 13 12 11 10 9 7 6 5 4 3 2 1 8 0 before H G F E D C B A $24 SDR $24,3 ($0) address 8 address 0 15 14 13 12 11 10 9 E D C B A 2 1 8 0 after
register
memory (big-endian) address 8 address 0 8 0 9 10 11 12 13 14 15 1 2 3 4 5 6 7 before A B C
register D E F G H $24
SDR $24,5 ($0) address 8 address 0 Restrictions: 8 C 9 10 11 12 13 14 15 D E F G H 6 7 after
None
A-99
Appendix A CPU Instruction Set Details
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor BigEndian4) If (BigEndian = 0) then pAddr pAddr(PSIZE-31)..3 || 03 endif byte vAddr2..0 xor BigEndian4 if(vAddr3 xor BigEndian = 0) then dataquad 064 || GPR [rt] (63-8*byte)..0 || 08*byte else dataquad GPR [rt] (63-8*byte)..0 || 08*byte || 064 endif StoreMemory (uncached, DOUBLEWORD-byte, dataquad, pAddr, vAddr, DATA)
Given a doubleword in a register and a doubleword in memory, the operation of SDR is as follows:
A-100
Appendix A CPU Instruction Set Details
SDR
MSB 63 Register Little-endian Memory 15 i 14 j 13 k A 12 l B 11 m C 10 n D 9 o E 8 p F 7 q G 6 r H 5 s 4 t 3 u 2 v 1 w 0 x 0 LSB
Little-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 i i i i i i i i A B C D E F G H j j j j j j j j B C D E F G H j k k k k k k k k C D E F G H k k l l l l l l l l D E F G H l l l m m m m m m m m E F G H m m m m n n n n n n n n F G H n n n n n o o o o o o o o G H o o o o o o p p p p p p p p H p p p p p p p A B C D E F G H q q q q q q q q B C D E F G H r r r r r r r r r C D E F G H s s s s s s s s s s D E F G H t t t t t t t t t t t E F G H u u u u u u u u u u u u F G H v v v v v v v v v v v v v G H w w w w w w w w w w w w w w H x x x x x x x x x x x x x x x 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 Type LEM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 offset BEM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A-101
Appendix A CPU Instruction Set Details
SDR
MSB 63 Register Big-endian Memory Little-endian 0 i 15 1 j 14 2 k 13 A 3 l 12 B 4 m 11 C 5 n 10 D 6 o 9 E 7 p 8 F 8 q 7 G 9 r 6 H 10 s 5 11 t 4 12 u 3 13 v 2 14 w 1 15 x 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 H G F E D C B A i i i i i i i i j H G F E D C B j j j j j j j j k k H G F E D C k k k k k k k k l l l H G F E D l l l l l l l l m m m m H G F E m m m m m m m m n n n n n H G F n n n n n n n n o o o o o o H G o o o o o o o o p p p p p p p H p p p p p p p p q q q q q q q q H G F E D C B A r r r r r r r r r H G F E D C B s s s s s s s s s s H G F E D C t t t t t t t t t t t H G F E D u u u u u u u u u u u u H G F E v v v v v v v v v v v v v H G F w w w w w w w w w w w w w w H G x x x x x x x x x x x x x x x H 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Type LEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 offset BEM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndianMem = 0) BigEndianMem = 1 AccessLength sent to memory pAddr3..0 sent to memory
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-102
Appendix A CPU Instruction Set Details
SH
31 26 25 21 20
Store Halfword 16 15 0
SH
offset
16
SH 101001
6
base
5
rt
5
MIPS I
Format: Purpose: Description: SH rt, offset (base) To store a halfword to memory. memory [base + offset] rt
The least-significant 16-bit halfword if register rt is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 if (vAddr0) 0 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor (BigEndian3 || 0)) byte vAddr3..0 xor (BigEndian3 || 0) dataquad GPR [rt] (127-8*byte)..0 || 08*byte StoreMemory (uncached, HALFWORD, dataquad, pAddr, vAddr, DATA)
Exceptions:
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-103
Appendix A CPU Instruction Set Details
SLL
31 26 25 21 20
Shift Word Left Logical 16 15 11 10 65 0
SLL
sa
5
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
SLL 000000
6
MIPS I
Format: Purpose: Description: SLL rd, rt, sa To left shift a word by a fixed number of bits. rd rt << sa
The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeroes into the emptied bits; the word result is placed in GPR rd. The bit shift count is specified by sa. The result word is sign-extended.
Restrictions:
None
Operation:
s sa temp GPR [rt](31-s)..0 || 0s GPR [rd]63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
Unlike nearly all other word operations the input operand does not have to be a properly sign-extended word value to produce a valid sign-extended 32-bit result. The result word is always sign extended into a 64-bit destination register; this instruction with a zero shift amount truncates a 64-bit value to 32 bits and sign extends it and stores it in the destination register.
A-104
Appendix A CPU Instruction Set Details
SLLV
31 26 25 21 20
Shift Word Left Logical Variable 16 15 11 10 65 0
SLLV
SLLV 000100
6
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
MIPS I
Format: Purpose: Description: SLLV rd, rt, rs To left shift a word by a variable number of bits. rd rt << rs
The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeroes into the emptied bits; the result word is placed in GPR rd. The bit shift count is specified by the low-order five bits of GPR rs. The result word is sign-extended.
Restrictions:
None
Operation:
s GP [rs]4..0 temp GPR [rt](31-s)..0 || 0s GPR [rd]63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
None
A-105
Appendix A CPU Instruction Set Details
SLT
31 26 25 21 20
Set on Less Than 16 15 11 10 65 0
SLT
0 00000
5
SPECIAL 000000
6
rs
5
rt
5
rd
5
SLT 101010
6
MIPS I
Format: Purpose: Description: SLT rd, rs, rt To record the result of a less-than comparison. rd (rs < rt)
Compare the contents of GPR rs and GPR rt as signed integers and record the Boolean result of the comparison in GPR rd. If GPR rs is less than GPR rt the result is 1 (true), otherwise 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if GPR [rs]63..0 < GPR [rt] 63..0 then GPR [rd] 63..0 0GPRLEN-1 || 1 else GPR [rd] 63..0 0GPRLEN endif
Exceptions:
None
Programming Notes:
None
A-106
Appendix A CPU Instruction Set Details
SLTI
31 26 25 21 20
Set on Less Than Immediate 16 15 0
SLTI
SLTI 001010
6
rs
5
rt
5
immediate
16
MIPS I
Format: Purpose: Description: SLTI rt, rs, immediate To record the result of a less-than comparison with a constant. rt (rs < immediate)
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers and record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate the result is 1 (true), otherwise 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if GPR [rs] 63..0 < sign_extend (immediate) then GPR [rd] 63..0 0GPRLEN-1 || 1 else GPR [rd] 63..0 0GPRLEN endif
Exceptions:
None
Programming Notes:
None
A-107
Appendix A CPU Instruction Set Details
SLTIU
31 26 25
Set on Less Than Immediate Unsigned 21 20 16 15
SLTIU
0
SLTIU 001011
6
rs
5
rt
5
immediate
16
MIPS I
Format: Purpose: Description: SLTIU rt, rs, immediate To record the result of an unsigned less-than comparison with a constant. rt (rs < immediate)
Compare the contents of GPR rs and the sign-extended 16-bit immediate as unsigned integers and record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate the result is 1 (true), otherwise 0 (false). Because the 16-bit immediate is sign-extended before comparison, the instruction is able to represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range. The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if (0 || GPR [rs] 63..0) < (0 || sign_extend (immediate)) then GPR [rd] 63..0 0GPRLEN-1 || 1 else GPR [rd] 63..0 0GPRLEN endif
Exceptions:
None
Programming Notes:
None
A-108
Appendix A CPU Instruction Set Details
SLTU
31 26 25 21 20
Set on Less Than Unsigned 16 15 11 10 65 0
SLTU
SLTU 101011
6
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
MIPS I
Format: Purpose: Description: SLTU rd, rs, rt To record the result of an unsigned less-than comparison. rd (rs < rt)
Compare the contents of GPR rs and GPR rt as unsigned integers and record the Boolean result of the comparison in GPR rd. If GPR rs is less than GPR rt the result is 1 (true), otherwise 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
Restrictions:
None
Operation:
if (0 || GPR [rs] 63..0) < (0 || GPR [rt] 63..0) then GPR [rd] 63..0 0GPRLEN-1 || 1 else GPR [rd] 63..0 0GPRLEN endif
Exceptions:
None
Programming Notes:
None
A-109
Appendix A CPU Instruction Set Details
SRA
31 26 25 21 20
Shift Word Right Arithmetic 16 15 11 10 65 0
SRA
SRA 000011
6
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
sa
5
MIPS I
Format: Purpose: Description: SRA rd, rt sa To arithmetic right shift a word by a fixed number of bits. rd rt >> sa (arithmetic)
The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the signbit (bit 31) in the emptied bits; the word result is placed in GPR rd. The bit shift count is specified by sa. The result word is sign-extended.
Restrictions:
If GPR rt does not contain a sign-extended 32-bit value (bit 63..31 equal) then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR [rt] 63..0 )) then UndefinedResult () endif s sa temp (GPR [rt]31)s || GPR [rt]31..s GPR [rd] 63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
None
A-110
Appendix A CPU Instruction Set Details
SRAV
31 26 25
Shift Word Right Arithmetic Variable 21 20 16 15 11 10 65 0
SRAV
SRAV 000111
6
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
MIPS I
Format: Purpose: Description: SRAV rd, rt, rs To arithmetic right shift a word by a variable number of bits. rd rt >> rs (arithmetic)
The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the signbit (bit 31) in the emptied bits; the word result is placed in GPR rd. The bit shift count is specified by the low-order five bits of GPR rs. The result word is sign-extended.
Restrictions:
If GPR rt does not contain a sign-extended 32-bit value (bit 63..31 equal) then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR [rt] 63..0 )) then UndefinedResult () endif s GPR [rs]4..0 temp (GPR [rt]31)s || GPR [rt]31..s GPR [rd] 63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
None
A-111
Appendix A CPU Instruction Set Details
SRL
31 26 25 21 20
Shift Word Right Logical 16 15 11 10 65 0
SRL
sa
5
SPECIAL 000000
6
0 00000
5
rt
5
rd
5
SRL 000010
6
MIPS I
Format: Purpose: Description: SRL rd, rt, sa To logical right shift a word by a fixed number of bits. rd rt >> sa (logical)
The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the emptied bits; the word result is placed in GPR rd. The bit shift count is specified by sa. The result word is sign-extended.
Restrictions:
If GPR rt does not contain a sign-extended 32-bit value (bit 63..31 equal) then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR [rt] 63..0)) then UndefinedResult () endif s sa temp 0s || GPR [rt]31..s GPR [rd] 63..0 sign_extend(temp31..0)
Exceptions:
None
Programming Notes:
None
A-112
Appendix A CPU Instruction Set Details
SRLV
31 26 25 21 20
Shift Word Right Logical Variable 16 15 11 10 65 0
SRLV
SRLV 000110
6
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
MIPS I
Format: Purpose: Descriptions: SRLV rd, rt, rs To logical right shift a word by a variable number of bits. rd rt >> rs (logical)
The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the emptied bits; the word result is placed in GPR rd. The bit shift count is specified by the low-order five bits of GPR rs. The result word is sign-extended.
Restrictions:
If GPR rt does not contain a sign-extended 32-bit value (bits 63..31 equal) then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR[rt] 63..0)) then UndefinedResult () endif s GPR [rs]4..0 temp 0s || GPR [rt]31..s GPR [rd] 63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
None
A-113
Appendix A CPU Instruction Set Details
SUB
31 26 25 21 20
Subtract Word 16 15 11 10 65 0
SUB
0 00000
5
SPECIAL 000000
6
rs
5 5
rt
rd
5
SUB 100010
6
MIPS I
Format: Purpose: Description: SUB rd, rs, rt To subtract 32-bit integers. If overflow occurs, then trap. rd rs - rt
The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs to produce a 32-bit result. If the subtraction results in 32-bit 2's complement arithmetic overflow then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit result is placed into GPR rd.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR[rs] 63..0) or NotWordValue (GPR[rt] 63..0)) then UndefinedResult () endif temp GPR [rs] 63..0 - GPR [rt] 63..0 if (32_bit_arithmetic_overflow) then SignalException (IntegerOverflow) else GPR [rd] 63..0 sign_extend (temp31..0) endif
Exceptions:
Integer Overflow
Programming Notes:
SUBU performs the same arithmetic operation but, does not trap on overflow.
A-114
Appendix A CPU Instruction Set Details
SUBU
31 26 25 21 20
Subtract Unsigned Word 16 15 11 10 65
SUBU
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
SUBU 100011
6
MIPS I
Format: Purpose: Description: SUBU rd, rs, rt To subtract 32-bit integers. rd rs - rt
The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs and the 32bit arithmetic result is placed into GPR rd. No integer overflow exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation is undefined.
Operation:
if (NotWordValue (GPR[rs] 63..0) or NotWordValue (GPR[rt] 63..0)) then UndefinedResult () endif temp GPR [rs] 63..0 - GPR [rt] 63..0 GPR [rd] 63..0 sign_extend (temp31..0)
Exceptions:
None
Programming Notes:
The term "unsigned" in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. It is appropriate for arithmetic which is not signed, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
A-115
Appendix A CPU Instruction Set Details
SW
31 26 25 21 20
Store Word 16 15 0
SW
offset
16
SW 101011
6
base
5
rt
5
MIPS I
Format: Purpose: Description: SW rt, offset (base) To store a word to memory. memory [base + offset] rt
The least-significant 32-bit word of register rt is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
Restrictions:
The effective address must be naturally aligned. If either of the two least-significant bits of the address are non-zero, an Address Error exception occurs.
Operation: (128-bit bus)
vAddr sign_extend (offset) + GPR [base] 31..0 if ( vAddr1..0) 02 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1).. 4 || (pAddr3..0 xor (BigEndian2 || 02)) byte vAddr3..0 xor (BigEndian2 || 02) dataquad GPR [rt] (127-8*byte)..0 || 08*byte StoreMemory (uncached, WORD, dataquad, pAddr, vAddr, DATA)
Exceptions:
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-116
Appendix A CPU Instruction Set Details
SWL
31 26 25 21 20
Store Word Left 16 15 0
SWL
offset
16
SWL 101010
6
base
5
rt
5
MIPS I
Format: Purpose: Description: SWL rt, offset (base) To store the more-significant part of a word to an unaligned memory address. memory [base + offset] rt
Paired SWL and SWR instructions are used to store a word from a register into four consecutive bytes in memory starting at an arbitrary byte address. SWL stores the left (most-significant) bytes and SWR stores the right (least-significant) bytes. The SWL instruction adds its sign-extended 16-bit offset to the contents of GPR base to form an effective address which may specify an arbitrary byte. It alters only the word in memory which contains that byte. From one to four bytes will be stored, depending on the starting byte specified. Conceptually, it starts at the most-significant byte of the register and copies it to the specified byte in memory; then it copies bytes from register to memory until it reaches the low-order byte of the word in memory. No address exceptions due to alignment are possible.
memory (little-endian) register address 4 address 0 7 3 6 2 5 1 4 0 before SWL $24,6 ($0) address 4 address 0 7 3 D 2 C 1 B 0 after D C B A $24
memory (big-endian) register address 4 address 0 4 0 5 1 6 2 7 3 before SWL $24,1 ($0) address 4 address 0 4 0 5 A 6 B 7 C after A B C D $24
A-117
Appendix A CPU Instruction Set Details
Restrictions:
None
Operation:
vAddr sign_extend (offset) + GPR [base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor BigEndian4) If (BigEndian = 0) then pAddr pAddr(PSIZE-1)..2 || 02 endif byte vAddr1..0 xor BigEndian2 if (vAddr3..2 xor BigEndian2) = 002 then dataquad 096 || 0(24-8*byte) || GPR[rt]31.. (24-8*byte) elseif (vAddr3..2 xor BigEndian2) = 012 then dataquad 064 || 0(24-8*byte) || GPR [rt]31.. (24-8*byte) || 032 elseif (vAddr3..2 xor BigEndian2) = 102 then dataquad 032 || 0(24-8*byte) || GPR [rt]31.. (24-8*byte) || 032 elseif (vAddr3..2 xor BigEndian2) = 112 then dataquad 0(24-8*byte) || GPR [rt]31.. (24-8*byte) || 064 endif StoreMemory (uncached, byte, dataquad, pAddr, vAddr, DATA)
Given a doubleword in a register and a doubleword in memory, the operation of SWL is as follows:
A-118
Appendix A CPU Instruction Set Details
SWL
MSB 63 Register Little-endian Memory 15 i 14 j 13 k A 12 l B 11 m C 10 n D 9 o E 8 p F 7 q G 6 r H 5 s 4 t 3 u 2 v 1 w 0 x 0 LSB
Little-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 i i i i i i i i i i i i i i i E j j j j j j j j j j j j j j E F k k k k k k k k k k k k k E F G l l l l l l l l l l l l E F G H m m m m m m m m m m m E m m m m n n n n n n n n n n E F n n n n o o o o o o o o o E F G o o o o p p p p p p p p E F G H p p p p q q q q q q q E q q q q q q q q r r r r r r E F r r r r r r r r s s s s s E F G s s s s s s s s t t t t E F G H t t t t t t t t u u u E u u u u u u u u u u u u v v E F v v v v v v v v v v v v w E F G w w w w w w w w w w w w E F G H x x x x x x x x x x x x 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Type offset LEM 0 0 0 0 4 4 4 4 8 8 8 8 12 12 12 12 BEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
A-119
Appendix A CPU Instruction Set Details
SWL
MSB 63 Register Big-endian Memory Little-endian 0 i 15 1 j 14 2 k 13 A 3 l 12 B 4 m 11 C 5 n 10 D 6 o 9 E 7 p 8 F 8 q 7 G 9 r 6 H 10 s 5 11 t 4 12 u 3 13 v 2 14 w 1 15 x 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 1) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 E i i i i i i i i i i i i i i i F E j j j j j j j j j j j j j j G G E k k k k k k k k k k k k k H H F E l l l l l l l l l l l l m m m m E m m m m m m m m m m m n n n n F E n n n n n n n n n n o o o o G F E o o o o o o o o o p p p p H G F E p p p p p p p p q q q q q q q q E q q q q q q q r r r r r r r r F E r r r r r r s s s s s s s s G F E s s s s s t t t t t t t t H G F F t t t t u u u u u u u u u u u u E u u u v v v v v v v v v v v v F E v v w w w w w w w w w w w w G F E w x x x x x x x x x x x x H G F F 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 Type offset LEM 12 12 12 12 8 8 8 8 4 4 4 4 0 0 0 0 BEM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndianMem = 0) BigEndianMem = 1 AccessLength sent to memory pAddr3..0 sent to memory
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-120
Appendix A CPU Instruction Set Details
SWR
31 26 25 21 20
Store Word Right 16 15 0
SWR
offset
16
SWR 101110
6
base
5 5
rt
MIPS I
Format: Purpose: Description: SWR rt, offset (base) To store the less-significant part of a word to an unaligned memory address. memory [base + offset] rt
Paired SWL and SWR instructions are used to store a word from a register into four consecutive bytes in memory starting at an arbitrary byte address. SWL stores the left (most-significant) bytes and SWR stores the right (least-significant) bytes. The SWR instruction adds its sign-extended 16-bit offset to the contents of GPR base to form an effective address which may specify an arbitrary byte. It alters only the word in memory which contains that byte. From one to four bytes will be stored, depending on the starting byte specified. Conceptually, it starts at the least-significant (rightmost) byte of the register and copies it to the specified byte in memory; then copies bytes from register to memory until it reaches the high-order byte of the word in memory. No address exceptions due to alignment are possible.
memory (little-endian) register address 4 address 0 7 3 6 2 5 1 4 0 before SWR $24,3 ($0) address 4 address 0 7 A 6 2 5 1 4 0 after D C B A $24
memory (big-endian) register address 4 address 0 4 0 5 1 6 2 7 3 before SWR $24,4 ($0) address 4 address 0 D 0 5 1 6 2 7 3 after A B C D $24
A-121
Appendix A CPU Instruction Set Details
Restrictions:
None
Operation:
vAddr sign_extend (offset) + GPR [base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr(PSIZE-1)..4 || (pAddr3..0 xor BigEndian4) If (BigEndian = 0) then pAddr pAddr(PSIZE-1)..2 || 02 endif byte vAddr1..0 xor BigEndian2 if (vAddr3..2 xor BigEndian2) = 002 then dataquad 096 || GPR [rt] (31-8*byte)..0 || 08*byte else if (vAddr3..2 xor BigEndian2) = 012 then dataquad 064 || GPR [rt] (31-8*byte)..0 || 08*byte || 032 else if (vAddr3..2 xor BigEndian2) = 102 then dataquad 032 || GPR [rt] (31-8*byte)..0 || 08*byte || 064 else if (vAddr3..2 xor BigEndian2) = 112 then dataquad GPR [rt] (31-8*byte)..0 || 08*byte || 096 endif StoreMemory (uncached, WORD-byte, dataquad, pAddr, vAddr, DATA)
Given a doubleword in a register and a doubleword in memory, the operation of SWR is as follows:
A-122
Appendix A CPU Instruction Set Details
SWR
MSB 63 Register Little-endian Memory 15 i 14 j 13 k A 12 l B 11 m C 10 n D 9 o E 8 p F 7 q G 6 r H 5 s 4 t 3 u 2 v 1 w 0 x 0 LSB
Little-endian byte ordering (BigEndianCPU = 0) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 i i i i i i i i i i i i E F G H j j j j j j j j j j j j F G H j k k k k k k k k k k k k G H k k l l l l l l l l l l l l H l l l m m m m m m m m E F G H m m m m n n n n n n n n F G H n n n n n o o o o o o o o G H o o o o o o p p p p p p p p H p p p p p p p q q q q E F G H q q q q q q q q r r r r F G H r r r r r r r r r s s s s G H s s s s s s s s s s t t t t H t t t t t t t t t t t E F G H u u u u u u u u u u u u F G H v v v v v v v v v v v v v G H w w w w w w w w w w w w w w H x x x x x x x x x x x x x x x 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 Type offset LEM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 BEM 12 12 12 12 8 8 8 8 4 4 4 4 0 0 0 0
A-123
Appendix A CPU Instruction Set Details
SWR
MSB 63 Register Big-endian Memory Little-endian 0 i 15 1 j 14 2 k 13 A 3 l 12 B 4 m 11 C 5 n 10 D 6 o 9 E 7 p 8 F 8 q 7 G 9 r 6 H 10 s 5 11 t 4 12 u 3 13 v 2 14 w 1 15 x 0 0 LSB
Big-endian byte ordering (BigEndianCPU = 1) vAddr3..0 Destination memory contents after instruction(shaded is unchanged) (127---------------------------------------64 63------------------------------------------0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 H G F E i i i i i i i i i i i i j H G F j j j j j j j j j j j j k k H G k k k k k k k k k k k k l l l H l l l l l l l l l l l l m m m m H G F E m m m m m m m m n n n n n H G F n n n n n n n n o o o o o o H G o o o o o o o o p p p p p p p H p p p p p p p p q q q q q q q q H G F E q q q q r r r r r r r r r H G F r r r r s s s s s s s s s s H G s s s s t t t t t t t t t t t H t t t t u u u u u u u u u u u u H G F E v v v v v v v v v v v v v H G F w w w w w w w w w w w w w w H G x x x x x x x x x x x x x x x H 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Type offset LEM 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BEM 0 0 0 0 4 4 4 4 8 8 8 8 12 12 12 12
LEM BEM Type Offset
Exceptions:
Little-endian memory (BigEndianMem = 0) BigEndianMem = 1 AccessLength sent to memory pAddr3..0 sent to memory
TLB Refill TLB Invalid TLB Modified Address Error
Programming Notes:
None
A-124
Appendix A CPU Instruction Set Details
SYNC.stype
31 26 25
Synchronize Shared Memory 11 10 65
SYNC.stype
0
SPECIAL 000000
6
0 000 0000 0000 0000
15
stype
5
SYNC 001111
6
MIPS II
Format: SYNC (stype = 0xxxx) SYNC.L (stype = 0xxxx) SYNC.P (stype = 1xxxx) Purpose: Description: To perform either a memory barrier operation or a pipeline barrier operation.
This instruction either interlocks the pipeline until all pending loads and stores are completed or all earlier issued instructions are completed. In case of the SYNC or the SYNC.L instructions (memory barrier) all pending loads and stores are retired. Loads are retired when the destination register is written. Stores are retired when the stored data (in store buffers or write buffers) is either stored in the data cache, or sent on the processor bus and SYSDACK* has been asserted. All uncached accelerated data gathering operation is terminated. The uncached accelerated buffer is invalidated. All bus read processes due to load/store/pref/cache instructions are completed. All pending bus write processes in the write back buffer are completed. In case of the SYNC.P instruction (pipeline barrier) all instructions prior to the barrier are completed before the instructions following the barrier operation are fetched. Note that the barrier operation does not wait for any instruction which was issued prior to the barrier operation but not retired (e.g., multiply, divide, multicycle COP1 operations or a pending load which were issued prior to the barrier operation).
Operation:
SyncOperation (stype)
Exceptions:
None
Programming Notes:
The SYNC instruction (SYNC.P or SYNC.L) is not allowed in the branch delay slot of instructions which have branch delay slots.
A-125
Appendix A CPU Instruction Set Details
SYSCALL
31 26 25
System Call 65
SYSCALL
0
SPECIAL 000000
6
code
20
SYSCALL 001100
6
MIPS I
Format: Purpose: Description: SYSCALL To cause a System Call exception.
A system call exception occurs, immediately and unconditionally transferring control to the exception handler. The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.
Restrictions:
None
Operation:
SignalException (SystemCall)
Exceptions:
System Call
Programming Notes:
None
A-126
Appendix A CPU Instruction Set Details
TEQ
31 26 25 21 20
Trap if Equal 16 15 65 0
TEQ
TEQ 110100
6
SPECIAL 000000
6
rs
5
rt
5
code
10
MIPS II
Format: Purpose: Description: TEQ rs, rt To compare GPRs and do a conditional Trap. if (rs = rt) then Trap
Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is equal to GPR rt then take a Trap exception. The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR[rs]63..0 = GPR[rt] 63..0 then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-127
Appendix A CPU Instruction Set Details
TEQI
31 26 25 21 20
Trap if Equal Immediate 16 15 0
TEQI
REGIMM 000001
6
rs
5
TEQI 01100
5
immediate
16
MIPS II
Format: Purpose: Description: TEQI rs, immediate To compare a GPR to a constant and do a conditional Trap. if (rs = immediate) then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integer; if GPR rs is equal to immediate then taken a Trap exception.
Restrictions:
None
Operation:
if GPR [rs] 63..0 = sign_extend (immediate) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-128
Appendix A CPU Instruction Set Details
TGE
31 26 25 21 20
Trap if Greater or Equal 16 15 65 0
TGE
TGE 110000
6
SPECIAL 000000
6
rs
5
rt
5
code
10
MIPS II
Format: Purpose: Description: TGE rs, rt To compare GPRs and do a conditional Trap. if (rs rt) then Trap
Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is greater than or equal to GPR rt then take a Trap exception. The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR [rs] 63..0 GPR [rt] 63..0 then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-129
Appendix A CPU Instruction Set Details
TGEI
31 26 25 21 20
Trap if Greater or Equal Immediate 16 15 0
TGEI
REGIMM 000001
6
rs
5
TGEI 01000
5
immediate
16
MIPS II
Format: Purpose: Description: TGEI rs, immediate To compare a GPR to a constant and do a conditional Trap. if (rs immediate) then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs is greater than or equal to immediate then take a Trap exception.
Restrictions:
None
Operation:
if GPR [rs] 63..0 sign_extend (immediate) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-130
Appendix A CPU Instruction Set Details
TGEIU
31 26 25
Trap if Greater or Equal Immediate Unsigned 21 20 16 15
TGEIU
0
REGIMM 000001
6
rs
5
TGEIU 01001
5
immediate
16
MIPS II
Format: Purpose: Description: TGEIU rs, immediate To compare a GPR to a constant and do a conditional Trap. if (rs immediate) then Trap
Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers; if GPR rs is greater than or equal to immediate then take a Trap exception. Because the 16-bit immediate is sign-extended before comparison, the instruction is able to represent the smallest or largest unsigned numbers. The representable values are at the minimum [0,32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.
Restrictions:
None
Operation:
if (0 || GPR[rs] 63..0) (0 || sign_extend (immediate)) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-131
Appendix A CPU Instruction Set Details
TGEU
31 26 25 21 20
Trap if Greater or Equal Unsigned 16 15 65 0
TGEU
TGEU 110001
6
SPECIAL 000000
6
rs
5
rt
5
code
10
MIPS II
Format: Purpose: Description: TGEU rs, rt To compare GPRs and do a conditional Trap. if (rs rt) then Trap
Compare the contents of GPR rs and GPR rt as unsigned integers; if GPR rs is greater than or equal to GPR rt then take a Trap exception. The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if (0 || GPR[rs] 63..0)) (0 || GPR[rt] 63..0) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-132
Appendix A CPU Instruction Set Details
TLT
31 26 25 21 20
Trap if Less Than 16 15 65 0
TLT
TLT 110010
6
SPECIAL 000000
6
rs
5
rt
5
code
10
MIPS II
Format: Purpose: Description: TLT rs, rt To compare GPRs and do a conditional Trap. if (rs < rt) then Trap
Compare the contents of GPR rs and GPR rs as signed integers; if GPR rs is less than GPR rt then take a Trap exception. The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR [rs] 63..0 < GPR [rt] 63..0 then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-133
Appendix A CPU Instruction Set Details
TLTI
31 26 25 21 20
Trap if Less Than Immediate 16 15 0
TLTI
REGIMM 000001
6
rs
5
TLTI 01010
5
immediate
16
MIPS II
Format: Purpose: Description: TLTI rs, immediate To compare a GPR to a constant and do a conditional Trap. if (rs < immediate) then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs is less than immediate then take a Trap exception.
Restrictions:
None
Operation:
if GPR[rs] 63..0 < sign_extend (immediate) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-134
Appendix A CPU Instruction Set Details
TLTIU
31 26 25
Trap if Less Than Immediate Unsigned 21 20 16 15 0
TLTIU
REGIMM 000001
6
rs
5
TLTIU 01011
5
immediate
16
MIPS II
Format: Purpose: Description: TLTIU rs, immediate To compare a GPR to a constant and do a conditional Trap. if (rs < immediate) then Trap
Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers; if GPR rs is less than immediate then take a Trap exception. Because the 16-bit immediate is sign-extended before comparison, the instruction is able to represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.
Restrictions:
None
Operation:
if (0 || GPR[rs] 63..0) < (0 || sign_extend (immediate)) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-135
Appendix A CPU Instruction Set Details
TLTU
31 26 25 21 20
Trap if Less Than Unsigned 16 15 65 0
TLTU
TLTU 110011
6
SPECIAL 000000
6
rs
5
rt
5
code
10
MIPS II
Format: Purpose: Description: TLTU rs, rt To compare GPRs and do a conditional Trap. if (rs < rt) then Trap
Compare the contents of GPR rs and GPR rt as unsigned integers; if GPR rs is less than GPR rt then take a Trap exception. The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if (0 || GPR[rs] 63..0) < (0 || GPR[rt] 63..0) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-136
Appendix A CPU Instruction Set Details
TNE
31 26 25 21 20
Trap if Not Equal 16 15 65 0
TNE
TNE 110110
6
SPECIAL 000000
6
rs
5
rt
5
code
10
MIPS II
Format: Purpose: Description: TNE rs, rt To compare GPRs and do a conditional Trap. if (rs rt) then Trap
Compare the contents of GPR rs and GPR rt as signed integers; if GPR rs is not equal to GPR rt then take a Trap exception. The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.
Restrictions:
None
Operation:
if GPR[rs] 63..0 GPR[rt] 63..0 then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-137
Appendix A CPU Instruction Set Details
TNEI
31 26 25 21 20
Trap if Not Equal Immediate 16 15 0
TNEI
REGIMM 000001
6
rs
5
TNEI 01110
5
immediate
16
MIPS II
Format: Purpose: Description: TNEI rs, immediate To compare a GPR to a constant and do a conditional Trap. if (rs immediate) then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; if GPR rs is not equal to immediate then take a Trap exception.
Restriction:
None
Operation:
if GPR[rs] 63..0 sign_extend (immediate) then SignalException (Trap) endif
Exceptions:
Trap
Programming Notes:
None
A-138
Appendix A CPU Instruction Set Details
XOR
31 26 25 21 20
Exclusive OR 16 15 11 10 65 0
XOR
0 00000
5
SPECIAL 000000
6
rs
5
rt
5
rd
5
XOR 100110
6
MIPS I
Format: Purpose: Description: XOR rd, rs, rt To do a bitwise logical EXCLUSIVE OR. rd rs XOR rt
Combine the contents of GPR rs and GPR rt in a bitwise logical exclusive OR operation and place the result into GPR rd.
Restrictions:
None
Operation:
GPR[rd] 63..0 GPR[rs] 63..0 xor GPR[rt] 63..0
Exceptions:
None
Programming Notes:
None
A-139
Appendix A CPU Instruction Set Details
XORI
31 26 25 21 20
Exclusive OR Immediate 16 15 0
XORI
XORI 001110
6
rs
5 5
rt
immediate
16
MIPS I
Format: Purpose: Description: XORI rt, rs, immediate To do a bitwise logical EXCLUSIVE OR with a constant. rt rs XOR immediate
Combine the contents of GPR rs and the 16-bit zero-extended immediate in a bitwise logical exclusive OR operation and place the result into GPR rt.
Restrictions:
None
Operation:
GPR[rt] 63..0 GPR[rs] 63..0 xor zero_extend (immediate)
Exceptions:
None
Programming Notes:
None
A-140
Appendix A CPU Instruction Set Details
A.5 CPU Instruction Encoding
The following table shows the OpCode encoding of CPU instructions for the MIPS IV architecture. This architecture level includes all MIPS I, MIPS II, MIPS III and some MIPS IV instructions. Even though the OpCodes for MTSAB, MTSAH, MFSA, MTSA, LQ, and SQ are shown in this OpCode table, these instructions are described in Appendix B since they are C790-specific instructions. Coprocessor 0 (COP0 - System Control Processor), Coprocessor 1 (COP1 - Floating-point Processor) and C790 specific instructions are described in separate sections.
31 26 0
OpCode
OpCode bits 28..26
Instructions encoded by OpCode field
bits 31..29 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111
0 000
1 001
2 010 J SLTI LDL LWL SWL
3 011 JAL SLTIU LDR LW SW PREF
4 100 BEQ ANDI BEQL MMI , LBU SDL
5 101 BNE ORI BNEL LHU SDR LDC1 SDC1
6 110 BLEZ XORI BLEZL LQ LWR SWR
7 111 BGTZ LUI BGTZL SQ LWU CACHE LD SD
SPECIAL REGIMM
ADDI
ADDIU
COP0 , COP1 , DADDI LB SB DADDIU LH SH LWC1 SWC1
31 26 OpCode = SPECIAL
5
0
function
function bits 2..0
Instructions encoded by function field when OpCode field = SPECIAL
bits 5..3 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111
0 000 SLL JR MFHI MULT ADD MFSA TGE DSLL
1 001 JALR MTHI MULTU ADDU MTSA TGEU
2 010 SRL MOVZ MFLO DIV SUB SLT TLT DSRL
3 011 SRA MOVN MTLO DIVU SUBU SLTU TLTU DSRA
4 100 SLLV SYSCALL DSLLV AND DADD TEQ DSLL32
5 101 BREAK OR DADDU
6 110 SRLV DSRLV XOR DSUB TNE DSRL32
7 111 SRAV SYNC DSRAV NOR DSUBU DSRA32
A-141
Appendix A CPU Instruction Set Details
31 26 OpCode = REGIMM 20 16 0
rt
rt
bits 18..16
Instructions encoded by rt field when OpCode field = REGIMM
bits 20..19 0 00 0 01 2 10 3 11
0 000 BLTZ TGEI BLTZAL
1 001 BGEZ TGEIU BGEZAL
2 010 BLTZL TLTI BLTZALL
3 011 BGEZL TLTIU BGEZALL
4 100 TEQI
5 101
6 110 TNEI
7 111
MTSAB MTSAH
*
This OpCode is reserved for future use. An attempt to execute it causes a Reserved Instruction exception. This OpCode is reserved for one of the following instructions which are currently not supported: DMULT, DMULTU, DDIV, DDIVU, LL, LLD, SC, SCD, LWC2, SWC2. An attempt to execute it causes a Reserved Instruction exception. This OpCode indicates an instruction class. The instruction word must be further decoded by examining additional tables that show the values for another instruction field. This OpCode indicates C790 specific instructions. It is included in the table because it uses a primary OpCode in the instruction encoding map. This OpCode is a coprocessor operation, not a CPU operation. If the processor state does not allow access to the specified coprocessor, the instruction causes a Coprocessor Unusable exception. It is included in the table because it uses a primary OpCode in the instruction encoding map. This OpCode indicates the class of Coprocessor 0 (System Control Processor) instructions. If the processor state does not allow access to the coprocessor 0, the instruction causes a Coprocessor Unusable exception. Further encoding information for this instruction class is in the COP0 Instruction Encoding tables. This OpCode indicates the class of Coprocessor 1 (Floating-Point Processor) instructions. If the processor state does not allow access to the coprocessor 1, the instruction causes a Coprocessor Unusable exception. Further encoding information for this instruction class is in the COP1 Instruction Encoding tables.

A-142
Appendix B C790-Specific Instruction Set Details
B. C790-Specific Instruction Set Details
This appendix provides a detailed description of the operation of each C790-specific instruction. The C790's instruction set is extended from the original MIPS ISA in order to support embedded applications. There are three classes of C790-specific instructions: * * * Three-operand Multiply and Multiply-Add instructions Multiply and Multiply-Add instructions for pipeline 1 Multimedia instructions
B-1
Appendix B C790-Specific Instruction Set Details
B.1 Conventions Used in This Chapter
The HI and LO registers are 128 bits wide. Some instructions operate on either the lower or the upper doublewords of these registers, and there are also instructions which operate on the complete registers. The following terminology is used for these registers. * Strictly speaking, a reference to the least-significant doubleword of the HI and LO register should use the names HI0 and LO0. However, to be consistent with existing MIPS terminology, these registers are just called HI and LO. Reference to the upper doublewords of the HI and LO registers is made by using the names HI1 and LO1. Occasionally, based on context, the complete 128-bit registers are referred to as HI and LO. Any portion of these registers can use the names HI and LO with the appropriate bit width specifications. Thus HI1 can be referred to as HI127..64 and LO1 can be referred to as LO127..64, etc.
* * *
B.1.1
Instruction Description Notation and Functions
The Operation sections of the instruction descriptions describe the operation performed by each instruction using a high-level language notation, or pseudocode. Symbols, functions, and structures used in the Operation sections are described here.
B.1.2
Pseudocode Language Statement Execution
Each of the high-level language statements in an operation description is executed in sequential order (as modified by conditional and loop constructs).
B.1.3
Pseudocode Symbols
Special symbols used in the notation are described in Appendix A.
B.2 Definitions for Pseudocode Functions Used in Operation Descriptions
A variety of functions are used in the pseudocode descriptions to make the pseudocode more readable and also to abstract implementation-specific behavior. These functions are defined in Appendix A.
B-2
Appendix B C790-Specific Instruction Set Details
B.3 Summary of C790-Specific Instructions
B.3.1
*
Multiply and Multiply-Add Instructions
Three-Operand Multiply and Multiply-Add (4 instructions)
MADD MADDU MULT MULTU
*
Multiply/Add Multiply/Add Unsigned Multiply (3-operand) Multiply Unsigned (3-operand)
Multiply Instructions for Pipeline 1 (10 instructions)
MULT1 MULTU1 DIV1 DIVU1 MADD1 MADDU1 MFHI1 MFLO1 MTHI1 MTLO1
Multiply Pipeline 1 Multiply Unsigned Pipeline 1 Divide Pipeline 1 Divide Unsigned Pipeline 1 Multiply-Add Pipeline 1 Multiply-Add Unsigned Pipeline 1 Move From HI1 Register Move From LO1 Register Move To HI1 Register Move To LO1 Register
B.3.2
*
Multimedia Instructions
Parallel Add Byte Parallel Subtract Byte Parallel Add Halfword Parallel Subtract Halfword Parallel Add Word Parallel Subtract Word Parallel Add/Subtract Halfword Parallel Add with Signed Saturation Byte Parallel Subtract with Signed Saturation Byte Parallel Add with Signed Saturation Halfword Parallel Subtract with Signed Saturation Halfword Parallel Add with Signed Saturation Word Parallel Subtract with Signed Saturation Word Parallel Add with Unsigned saturation Byte Parallel Subtract with Unsigned saturation Byte Parallel Add with Unsigned saturation Halfword Parallel Subtract with Unsigned saturation Halfword Parallel Add with Unsigned saturation Word Parallel Subtract with Unsigned saturation Word
Arithmetic (19 instructions)
PADDB PSUBB PADDH PSUBH PADDW PSUBW PADSBH PADDSB PSUBSB PADDSH PSUBSH PADDSW PSUBSW PADDUB PSUBUB PADDUH PSUBUH PADDUW PSUBUW
B-3
Appendix B C790-Specific Instruction Set Details
* Min/Max (4 instructions)
PMAXH PMINH PMAXW PMINW
*
Parallel Maximum Halfword Parallel Minimum Halfword Parallel Maximum Word Parallel Minimum Word
Absolute (2 instructions)
PABSH PABSW
*
Parallel Absolute Halfword Parallel Absolute Word
Logical (4 instructions)
PAND POR PXOR PNOR
*
Parallel AND Parallel OR Parallel XOR Parallel NOR
Shift (9 instructions)
PSLLH PSRLH PSRAH PSLLW PSRLW PSRAW PSLLVW PSRLVW PSRAVW
*
Parallel Shift Left Logical Halfword Parallel Shift Right Logical Halfword Parallel Shift Right Arithmetic Halfword Parallel Shift Left Logical Word Parallel Shift Right Logical Word Parallel Shift Right Arithmetic Word Parallel Shift Left Logical Variable Word Parallel Shift Right Logical Variable Word Parallel Shift Right Arithmetic Variable Word
Compare (6 instructions)
PCGTB PCEQB PCGTH PCEQH PCGTW PCEQW
*
Parallel Compare for Greater Than Byte Parallel Compare for Equal Byte Parallel Compare for Greater Than Halfword Parallel Compare for Equal Halfword Parallel Compare for Greater Than Word Parallel Compare for Equal Word
LZC (1 instruction)
PLZCW
*
Parallel Leading Zero or One Count Word
Quadword Load and Store (2 instructions)
LQ SQ
Load Quadword Store Quadword
B-4
Appendix B C790-Specific Instruction Set Details
* Multiply and Divide (19 instructions)
PMULTW PMULTUW PDIVW PDIVUW PMADDW PMADDUW PMSUBW PMULTH PMADDH PMSUBH PHMADH PHMSBH PDIVBW PMFHI PMFLO PMTHI PMTLO PMFHL PMTHL
*
Parallel Multiply Word Parallel Multiply Unsigned Word Parallel Divide Word Parallel Divide Unsigned Word Parallel Multiply-Add Word Parallel Multiply-Add Unsigned Word Parallel Multiply-Subtract Word Parallel Multiply Halfword Parallel Multiply-Add Halfword Parallel Multiply-Subtract Halfword Parallel Horizontal Multiply-Add Halfword Parallel Horizontal Multiply-Subtract Halfword Parallel Divide Broadcast Word Parallel Move From HI Register Parallel Move From LO Register Parallel Move To HI Register Parallel Move To LO Register Parallel Move From HI/LO Register Parallel Move To HI/LO Register
Pack/Extend (11 instructions)
PPAC5 PPACB PPACH PPACW PEXT5 PEXTUB PEXTLB PEXTUH PEXTLH PEXTUW PEXTLW
*
Parallel Pack to 5 bits Parallel Pack to Byte Parallel Pack to Halfword Parallel Pack to Word Parallel Extend Upper from 5 bits Parallel Extend Upper from Byte Parallel Extend Lower from Byte Parallel Extend Upper from Halfword Parallel Extend Lower from Halfword Parallel Extend Upper from Word Parallel Extend Lower from Word
Others (16 instructions)
PCPYH PCPYLD PCPYUD PREVH PINTH PINTEH PEXEH PEXCH PEXEW PEXCW QFSRV MFSA MTSA MTSAB MTSAH PROT3W
Parallel Copy Halfword Parallel Copy Lower Doubleword Parallel Copy Upper Doubleword Parallel Reverse Halfword Parallel Interleave Halfword Parallel Interleave Even Halfword Parallel Exchange Even Halfword Parallel Exchange Center Halfword Parallel Exchange Even Word Parallel Exchange Center Word Quadword Funnel Shift Right Variable Move from Shift Amount Register Move to Shift Amount Register Move Byte Count to Shift Amount Register Move Halfword Count to Shift Amount Register Parallel Rotate 3 Words
B-5
Appendix B C790-Specific Instruction Set Details
B.4 Instruction Set Details
In the following sections, details are provided for each of the C790-specific instructions. Exceptions that may occur due to the execution of each instruction are listed after the description of each instruction. Descriptions of the immediate cause and manner of handling exceptions are omitted from the instruction descriptions in this appendix.
B-6
Appendix B C790-Specific Instruction Set Details
DIV1
31 26 25 21 20
Divide Word Pipeline 1 16 15 65 0
DIV1
DIV1 011010
6
C790
MMI 011100
6
rs
5
rt
5
0 0000000000
10
Format: Purpose: Description:
DIV1 rs, rt To divide 32-bit signed integers using pipeline 1. (LO1, HI1) rs / rt
The 32-bit value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands as signed values. The 32-bit quotient is placed into special register LO1 (= LO127..64) and the 32-bit remainder is placed into special register HI1 (= HI127..64). No arithmetic exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation will be undefined. If the divisor in GPR rt is zero, the arithmetic result value will be undefined.
Operation:
if (NotWordValue(GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif GPR[rs]31..0 div GPR[rt]31..0 q GPR[rs]31..0 mod GPR[rt]31..0 r LO127..64 (q 31)32 || q 31..0 HI127..64 (r 31)32 || r 31..0
Supplementary Explanation:
Normally, when 0x80000000 (-2147483648) the signed minimum value is divided by 0xFFFFFFFF (-1), the operation will result in an overflow. However, in this instruction an overflow exception doesn't occur and the result will be as follows: Quotient is 0x80000000 (-2147483648), and remainder is 0x00000000 (0). This sign of the quotient and the remainder is based on the signs of the dividend and the divisor as shown in the table below:
B-7
Appendix B C790-Specific Instruction Set Details
Table B-1. Quotient and Remainder Signs Dividend Positive Positive Negative Negative Divisor Positive Negative Positive Negative Quotient Positive Negative Negative Positive Remainder Positive Positive Negative Negative
Exceptions:
None
Programming Notes:
In C790, the integer divide operation proceeds asynchronously and allows other CPU instructions to execute before it is retired. An attempt to read LO1 or HI1 registers before the results are written will cause an interlock until the results are ready. Out-of-order execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the divide so that other instructions can execute in parallel. No arithmetic exception occurs under any circumstances. Divide-by-zero or overflow conditions should be detected by instructions preceding the divide instruction. If the divide is asynchronous then the zero-divisor check can execute in parallel with the divide. The action taken on either divide-by-zero or overflow is either a convention within the program itself or more typically, the system software; one possibility is to take a BREAK exception with a code field value to signal the problem to the system software. As an example, the C programming language in a UNIX environment expects division by zero to either terminate the program or execute a program-specified signal handler. C does not expect overflow to cause any exceptional condition. If the C compiler uses a divide instruction, it also emits code to test for a zero divisor and execute a BREAK instruction to inform the operating system if one is detected.
B-8
Appendix B C790-Specific Instruction Set Details
DIVU1
31 26 25
Divide Unsigned Word Pipeline 1 21 20 16 15 65
DIVU1
0
MMI 011100
6
rs
5
rt
5
0 0000000000
10
DIVU1 011011
6
C790
Format: Purpose: Description:
DIVU1 rs, rt To divide 32-bit unsigned integers using pipeline 1. (LO1, HI1) rs / rt
The 32-bit value in GPR rs is divided by the 32-bit value in GPR rt, treating both operands as unsigned values. The 32-bit quotient is placed into special register LO1 (= LO127..64) and the 32-bit remainder is placed into special register HI1 (= HI127..64). No arithmetic exception occurs under any circumstances.
Restrictions:
If either GPR rt or GPR rs do not contain zero-extended 32-bit values (bits 63..32 equal zero), then the result of the operation is undefined. If the divisor in GPR rt is zero, the arithmetic result will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (0 || GPR[rs]31..0) div (0 || GPR[rt]31..0) q (0 || GPR[rs]31..0) mod (0 || GPR[rt]31..0) r LO127..64 (q 31)32 || q 31..0 HI127..64 (r 31)32 || r 31..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the DIV1 instruction.
B-9
Appendix B C790-Specific Instruction Set Details
LQ
31 26 25 21 20
Load Quadword 16 15 0
LQ
offset
16
C790
LQ 011110
6
base
5
rt
5
Format: Purpose: Description:
LQ rt, offset (base) To load a quadword from memory. rt memory [base + offset]
The contents of the 128-bit quadword at the memory location specified by the effective address are fetched and placed in the 128-bit GPR rt. The 16-bit signed offset is added to the contents of GPR base register to form the effective address. The least-significant four bits of the effective address are masked to zero (effectively creating an aligned address) before being used to access memory. No address exceptions due to alignment are possible.
Restriction:
The effective address doesn't have to be naturally aligned. The least significant 4 bits of the effective address are ignored.
Operations:
vAddr sign_extend (offset) + GPR [base]31..0 vAddr3..0 = 04 (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) memquad LoadMemory (uncached, QUADWORD, pAddr, vAddr, DATA) GPR[rt]127..0 memquad
Exceptions:
TLB Refill TLB Invalid Address Error
B-10
Appendix B C790-Specific Instruction Set Details
MADD
31 26 25 21 20
Multiply-Add word 16 15 11 10 65
MADD
0
MMI 011100
6
rs
5
rt
5
rd
5
0 00000
5
MADD 000000
6
C790
Format:
MADD rs, rt MADD rd, rs, rt
Purpose: Description:
To multiply 32-bit signed integers and add. (rd, HI, LO) (HI, LO) + rs x rt
The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as signed values, to produce a 64-bit multiply result. The 64-bit multiply result is added to the contents in special registers HI and LO. The low-order 32-bit word of the result is placed into special register LO and GPR rd, and the high-order 32-bit word of the result is placed into special register HI. No arithmetic exception occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (HI31..0 || LO31..0) + GPR[rs]31..0 * GPR[rt]31..0 prod (prod 31)32 || prod31..0 LO63..0 (prod 63)32 || prod63..32 HI63..0 GPR[rd]63..0 (prod 31)32 || prod31..0
Exceptions:
None
Programming Notes:
In C790, the integer multiply accumulate operation proceeds asynchronously and allows other CPU instructions to execute before it is retired. An attempt to read LO or HI registers before the results are written will cause an interlock until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.
B-11
Appendix B C790-Specific Instruction Set Details
MADD1
31 26 25 21 20
Multiply-Add word Pipeline 1 16 15 11 10 65
MADD1
0
MMI 011100
6
rs
5
rt
5
rd
5
0 00000
5
MADD1 100000
6
C790
Format:
MADD1 rs, rt MADD1 rd, rs, rt
Purpose: Description:
To multiply 32-bit signed integers and add in Pipeline 1. (rd, HI1, LO1) (HI1, LO1) + rs x rt
The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as signed values, to produce a 64-bit multiply result. The 64-bit multiply result is added to the contents in special registers HI1 (= HI127..64) and LO1 (= LO127..64). The loworder 32-bit word of the result is placed into special register LO1 and GPR rd, and the high-order 32-bit word of the result is placed into special register HI1. No arithmetic exception occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (HI95..64 || LO95..64) + GPR[rs]31..0 * GPR[rt]31..0 prod (prod 31)32 || prod31..0 LO127..64 (prod 63)32 || prod63..32 HI127..64 GPR[rd]63..0 (prod 31)32 || prod31..0
Exceptions:
None
Programming Notes:
In the C790, the integer multiply accumulate operation proceeds asynchronously and allows other CPU instructions to execute before it is retired. An attempt to read LO1 or HI1 registers before the results are written will cause an interlock until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.
B-12
Appendix B C790-Specific Instruction Set Details
MADDU
31 26 25 21 20
Multiply-Add Unsigned word 16 15 11 10 65
MADDU
0
MMI 011100
6
rs
5
rt
5
rd
5
0 00000
5
MADDU 000001
6
C790
Format:
MADDU rs, rt MADDU rd, rs, rt
Purpose: Description:
To multiply 32-bit unsigned integers and add. (rd, HI, LO) (HI, LO) + rs x rt
The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as unsigned values, to produce a 64-bit multiply result. The 64-bit multiply result is added to the contents in special registers HI and LO. The low-order 32-bit word of the result is placed into special register LO and GPR rd, and the high-order 32-bit word of the result is placed into special register HI. No arithmetic exception occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain zero-extended 32-bit values (bits 63..32 equal zero), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (HI31..0 || LO31..0) + (0 || GPR[rs]31..0) * (0 || GPR[rt]31..0) prod (prod 31)32 || prod31..0 LO63..0 (prod 63)32 || prod63..32 HI63..0 GPR[rd] 63..0 (prod 31)32 || prod31..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the MADD instruction
B-13
Appendix B C790-Specific Instruction Set Details
MADDU1
31 26 25
Multiply-Add Unsigned word Pipeline 1 21 20 16 15 11 10 65
MADDU1
0
MMI 011100
6
rs
5
rt
5
rd
5
0 00000
5
MADDU1 100001
6
C790
Format:
MADDU1 rs, rt MADDU1 rd, rs, rt
Purpose: Description:
To multiply 32-bit unsigned integers and add in Pipeline 1. (rd, HI1, LO1) (HI1, LO1) + rs x rt
The 32-bit value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as unsigned values, to produce a 64-bit multiply result. The 64-bit multiply result is added to the contents in special registers HI1 (= HI127..64) and LO1 (= LO127..64). The low-order 32-bit word of the result is placed into special register LO1 and GPR rd, and the high-order 32-bit word of the result is placed into special register HI1. No arithmetic exception occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain zero-extended 32-bit values (bits 63..32 equal zero), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (HI95..64 || LO95..64) + (0 || GPR[rs]31..0) * (0 || GPR[rt]31..0) prod (prod 31)32 || prod31..0 LO127..64 (prod 63)32 || prod63..32 HI127..64 GPR[rd]63..0 (prod 31)32 || prod31..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the MADD1 instruction
B-14
Appendix B C790-Specific Instruction Set Details
MFHI1
31 26 25
Move From HI1 Register 16 15 11 10 65
MFHI1
0
MMI 011100
6
0 0000000000
10
rd
5
0 00000
5
MFHI1 010000
6
C790
Format: Purpose: Description:
MFHI1 rd To copy the special purpose register HI1 to a GPR. rd HI1
The contents of special register HI1 (= HI127..64) are loaded into GPR rd.
Restrictions:
None
Operation:
GPR[rd]63..0 HI127..64
Exceptions:
None
B-15
Appendix B C790-Specific Instruction Set Details
MFLO1
31 26 25
Move From LO1 Register 16 15 11 10 65 0
MFLO1
MFLO1 010010
6 C790
MMI 011100
6
0 0000000000
10 5
rd
0 00000
5
Format: Purpose: Description:
MFLO1 rd To copy the special purpose LO1 register to a GPR. rd LO1
The contents of special register LO1 (= LO127..64) are loaded into GPR rd.
Restrictions:
None
Operation:
GPR[rd]63..0 LO127..64
Exceptions:
None
B-16
Appendix B C790-Specific Instruction Set Details
MFSA
31 26 25
Move from Shift Amount Register 16 15 11 10 65 0
MFSA
MFSA 101000
6
C790
SPECIAL 000000
6
0 00 0000 0000
10
rd
5
0 00000
5
Format: Purpose: Description:
MFSA rd To copy the shift amount register SA to a GPR. rd SA
The contents of SA, the special register storing the funnel shift amount, is loaded into GPR rd. Note that the shift amount is encoded in SA in an implementation-defined manner. Therefore, it is not meaningful for software to operate on the value returned in rd. The sole purpose of this instruction is to permit the shift amount to be saved during a context switch. The MTSA instruction should be used to restore the state of SA.
Restrictions:
None
Operation:
GPR[rd]63..0 SA
Exceptions:
None
Implementation Note:
This instruction executes only in pipeline 0.
B-17
Appendix B C790-Specific Instruction Set Details
MTHI1
31 26 25 21 20
Move To HI1 Register 65
MTHI1
0
MMI 011100
6
rs
5
0 000000000000000
15
MTHI1 010001
6
C790
Format: Purpose: Description:
MTHI1 rs To copy a GPR to the special purpose register HI1. HI1 rs
The contents of GPR rs are loaded into special register HI1 (= HI127..64).
Restrictions:
None
Operation:
HI127..64 GPR[rs]63..0
Exceptions:
None
Programming Notes:
None
B-18
Appendix B C790-Specific Instruction Set Details
MTLO1
31 26 25 21 20
Move To LO1 Register 65
MTLO1
0
MMI 011100
6
rs
5
0 000000000000000
15
MTLO1 010011
6
C790
Format: Purpose: Description:
MTLO1 rs To copy a GPR to the special purpose register LO1. LO1 rs
The contents of GPR rs are loaded into special register LO1 (= LO127..64).
Restrictions:
None
Operation:
LO127..64 GPR[rs]63..0
Exceptions:
None
Programming Notes:
None
B-19
Appendix B C790-Specific Instruction Set Details
MTSA
31 26 25 21 20
Move to Shift Amount Register 65 0
MTSA
MTSA 101001
6
C790
SPECIAL 000000
6
rs
5
0 000 0000 0000 0000
15
Format: Purpose: Description:
MTSA rs To copy a GPR to the shift amount register SA. SA rs
The contents of GPR rs are loaded into SA, the special register storing the funnel shift amount. Note that rs must contain a value that was originally generated by MFSA. If some other user-generated value is in rs, the shifting action performed by the funnel shifter is not defined; that is, MTSA cannot be used to by a program to set a new funnel shift amount. This is because the shift amount is encoded in SA in an implementationdefined manner. The sole purpose of this instruction is to permit the shift amount to be restored during a context switch.
Restrictions:
Note that the three instructions statically preceding a MTSA instruction must not read or write the SA register; that is, they cannot be either of the instructions MFSA, QFSRV, or MTSAx.
Use the MTSAB and MTSAH instructions to set a new funnel shift amount.
Operation:
SA GPR[rs]63..0
Exceptions:
None
Implementation Note:
1. MTSA updates the SA register in the A Stage. To keep exception processing simple, this requires that the cycle prior to MTSA not read the SA register. Also, when single stepping, making sure that SA always contains the value of the SA write instruction, just single stepped, requires that the cycle after MTSA not write the SA register. Both these rules are enforced by the architectural requirement that the three instructions prior to MTSA not read SA. 2. The MTSA instruction executes only in pipeline 0.
B-20
Appendix B C790-Specific Instruction Set Details
MTSAB
31 26 25
Move Byte Count to Shift Amount Register 21 20 16 15
MTSAB
0
REGIMM 000001
6
rs
5
MTSAB 11000
5
immediate
16 C790
Format: Purpose: Description:
MTSAB rs, immediate To copy a GPR to the shift amount register SA. SA (rs xor immediate) x 8
The least-significant four bits of GPR rs are XORed with the least-significant four bits of the immediate value. The resulting four bits are interpreted as a byte shift amount and stored into SA, the special register storing the funnel shift amount.
Restrictions:
The three instructions statically preceding a MTSAB instruction must not read the SA register; that is, they cannot be either of the instructions MFSA or QFSRV.
Operation:
SA (GPR[rs]3..0 xor immediate3..0) * 8
Exceptions:
None
Implementation Note:
1. MTSAB updates the SA register in the A Stage. To keep exception processing simple, this requires that the cycle prior to MTSAB not read the SA register. Also, when single stepping, making sure that SA always contains the value of the SA write instruction, just single stepped, requires that the cycle after the MTSAB not write the SA register. Both these rules are enforced by the architectural requirement that the three instructions prior to MTSAB not read SA. 2. The MTSAB instruction executes only in pipeline 0.
Programming Note:
MTSAB allows the user to load either a variable shift amount or a fixed shift amount, as follows:
mtsab mtsab 0, 5 // Set shift amount to "5 bytes" 10, 0 // Set byte shift amount to contents of GPR10
B-21
Appendix B C790-Specific Instruction Set Details
MTSAH
31 26 25
Move Halfword Count to Shift Amount Register 21 20 16 15
MTSAH
0
REGIMM 000001
6
rs
5
MTSAH 11001
5
immediate
16 C790
Format: Purpose: Description:
MTSAH rs, immediate To copy a GPR to the shift amount register SA. SA (rs xor immediate) x 16
The least-significant three bits of GPR rs are XORed with the least-significant three bits of the immediate value. The resulting three bits are interpreted as a halfword shift amount and stored into SA, the special register storing the funnel shift amount.
Restrictions:
The three instructions statically preceding a MTSAB instruction must not read the SA register; that is, they cannot be either of the instructions MFSA or QFSRV.
Operation:
SA (GPR[rs]2..0 xor immediate2..0) * 16
Exceptions:
None
Implementation Note:
1. MTSAH updates the SA register in the A Stage. To keep exception processing simple, this requires that the cycle prior to MTSAH not read the SA register. Also, when single stepping, making sure that SA always contains the value of the SA write instruction, just single stepped, requires that the cycle after MTSAH not write the SA register. Both these rules are enforced by the architectural requirement that the three instructions prior to MTSAH not read SA. 2. The MTSAH instruction executes only in pipeline 0.
Programming Note:
MTSAH allows the user to load either a variable shift amount or a fixed shift amount, as follows:
mtsah mtsah 0, 5 // Set shift amount to "5 halfwords" 10, 0 // Set halfword shift amount to value of GPR10
B-22
Appendix B C790-Specific Instruction Set Details
MULT
31 26 25 21 20
Multiply Word 16 15 11 10 65 0
MULT
0 00000
5
SPECIAL 000000
6
rs
5
rt
5
rd
5
MULT 011000
6
C790
Format: Purpose: Description:
MULT rd, rs, rt MULT rs, rt To multiply 32-bit signed integers. (rd, LO, HI) rs x rt
The 32-bit value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as signed values, to produce a 64-bit result. The low-order 32-bits of the result is placed into special register LO and GPR rd, and the high-order 32-bit of the result is placed into special register HI. No arithmetic exception occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif GPR[rs]31..0 * GPR[rt]31..0 prod (prod 31)32 || prod31..0 LO63..0 (prod 63)32 || prod63..32 HI63..0 (prod 31)32 || prod31..0 GPR[rd] 63..0
Exceptions:
None
Programming Notes:
In the C790, the integer multiply operation allows other CPU instructions to execute outof-order. An attempt to read LO or HI registers before the results are written will cause an interlock until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel. Programs that require overflow detection must check for it explicitly.
B-23
Appendix B C790-Specific Instruction Set Details
MULT1
31 26 25 21 20
Multiply Word Pipeline 1 16 15 11 10 65
MULT1
0
MMI 011100
6
rs
5
rt
5
rd
5
0 00000
5
MULT1 011000
6
C790
Format: Purpose: Description:
MULT1 rd, rs, rt MULT1 rs, rt To multiply 32-bit signed integers in Pipeline 1. (rd, HI1, LO1) rs x rt
The 32-bit value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as signed values, to produce a 64-bit result. The low-order 32-bits of the result is placed into special register LO1 (= LO127..64) and GPR rd, and the high-order 32-bits of the result is placed into special register HI1 (= HI127..64). No arithmetic exceptions occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 63..31 equal), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif GPR[rs]31..0 * GPR[rt]31..0 prod (prod 31)32 || prod 31..0 LO127..64 (prod 63)32 || prod 63..32 HI127..64 (prod 31)32 || prod31..0 GPR[rd]63..0
Exceptions:
None
Programming Notes:
In the C790 the integer multiply operation allows other CPU instructions to execute outof-order. An attempt to read LO1 or HI1 before the results are written will cause an interlock until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel. Programs that require overflow detection must check for it explicitly.
B-24
Appendix B C790-Specific Instruction Set Details
MULTU
31 26 25 21 20
Multiply Unsigned Word 16 15 11 10 65
MULTU
0
SPECIAL 000000
6
rs
5
rt
5
rd
5
0 00000
5
MULTU 011001
6
C790
Format: Purpose: Description:
MULTU rd, rs, rt
MULTU rs, rt
To multiply 32-bit unsigned integers. (rd, HI, LO) rs x rt
The 32-bit value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as unsigned values, to produce a 64-bit result. The low-order 32-bit of the result is placed into special register LO and GPR rd, and the high-order 32-bits of the result is placed into special register HI. No arithmetic exception occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain zero-extended 32-bit values (bits 63..32 equal zero), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (0 || GPR[rs]31..0) * (0 || GPR[rt]31..0) prod (prod 31)32 || prod31..0 LO63..0 (prod 63)32 || prod63..32 HI 63..0 (prod 31)32 || prod31..0 GPR[rd] 63..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the MULT instruction.
B-25
Appendix B C790-Specific Instruction Set Details
MULTU1
31 26 25
Multiply Unsigned Word Pipeline 1 21 20 16 15 11 10 65
MULTU1
0
MMI 011100
6
rs
5
rt
5
rd
5
0 00000
5
MULTU1 011001
6
C790
Format: Purpose: Description:
MULTU1 rd, rs, rt
MULTU1 rs, rt
To multiply 32-bit unsigned integers in Pipeline 1. (rd, HI1, LO1) rs x rt
The 32-bit value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as unsigned values, to produce a 64-bit result. The low-order 32-bit of the result is placed into special register LO1 (= LO127..64) and GPR rd, and the high-order 32-bit of the result is placed into special register HI1 (= HI127..64). No arithmetic exceptions occurs under any circumstances. If GPR rd is omitted in assembly language, 0 is used as the default value.
Restrictions:
If either GPR rt or GPR rs do not contain zero-extended 32-bit values (bits 63..32 equal zero), then the result of the operation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif ( 0 || GPR[rs]31..0) * (0 || GPR[rt]31..0) prod (prod 31)32 || prod 31..0 LO127..64 (prod 63)32 || prod 63..32 HI127..64 (prod 31)32 || prod 31..0 GPR[rd]63..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the MULT1 instruction.
B-26
Appendix B C790-Specific Instruction Set Details
PABSH
31 26 25 21 20
Parallel Absolute Halfword 16 15 11 10 65
PABSH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PABSH 00101
5
MMI1 101000
6
C790
Format: Purpose: Description:
PABSH rd, rt To calculate the absolute value of 8 16-bit integers in parallel. rd rt
The absolute value of the eight signed halfword values in GPR rt are placed into the corresponding eight halfwords in GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
GPR[rt]15..0 GPR[rt]31..16 GPR[rt]47..32 GPR[rt]63..48 GPR[rt]79..64 GPR[rt]95..80 GPR[rt]111..96 GPR[rt]127..112
96 95 80 79 64 63 48 47 32 31 16 15 0
rt
A7
A6
A5
A4
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7
A6
A5
A4
A3
A2
A1
A0
Supplementary explanation:
When the halfword value in GPR rt is 0x8000 (-32768), the smallest negative value, the operation will result in an overflow. However, overflow exception doesn't occur; the result is truncated to the largest positive number - 0x7FFF (+32767) .
Exceptions:
None
B-27
Appendix B C790-Specific Instruction Set Details
PABSW
31 26 25 21 20
Parallel Absolute Word 16 15 11 10 65
PABSW
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PABSW 00001
5
MMI1 101000
6
C790
Format: Purpose: Description:
PABSW rd, rt To calculate the absolute value of 4 32-bit integers in parallel. rd rt
The absolute value of the four signed word values in GPR rt are placed into the corresponding four words in GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127

GPR[rt]31..0 GPR[rt]63..32 GPR[rt]95..64 GPR[rt]127..96
96 95 64 63 32 31 0
rt
A3
A2
A1
A0
127
96 95
64 63
32 31
0
rd
A3
A2
A1
A0
Supplementary explanation:
When the word value of the GPR rt is equal to 0x80000000 (-2147483648), the smallest negative number, the operation will result in an overflow. However, if an overflow exception doesn't occur; the result is truncated to the largest positive value - 0x7FFFFFFF (+2147483647).
Exceptions:
None
B-28
Appendix B C790-Specific Instruction Set Details
PADDB
31 26 25 21 20
Parallel Add Byte 16 15 11 10 65
PADDB
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDB 01000
5
MMI0 001000
6
C790
Format: Purpose: Description:
PADDB rd, rs, rt To add 16 pairs of 8-bit integers in parallel. rd rs + rt
The sixteen byte values in GPR rs are added to the corresponding sixteen byte values in GPR rt in parallel. The results are placed into the corresponding sixteen bytes in GPR rd. No overflow or underflow exceptions are generated under any This instruction operates on 128-bit registers.
Operation:
circumstances.
GPR[rd]7..0 GPR[rd]15..8 GPR[rd]23..16 GPR[rd]31..24 GPR[rd]39..32 GPR[rd]47..40 GPR[rd]55..48 GPR[rd]63..56 GPR[rd]71..64 GPR[rd]79..72 GPR[rd]87..80 GPR[rd]95..88 GPR[rd]103..96 GPR[rd]111..104 GPR[rd]119..112 GPR[rd]127..120
rs A15 A14 A13
(GPR[rs]7..0 + GPR[rt]7..0)7..0 (GPR[rs]15..8 + GPR[rt]15..8)7..0 (GPR[rs]23..16 + GPR[rt]23..16)7..0 (GPR[rs]31..24 + GPR[rt]31..24)7..0 (GPR[rs]39..32 + GPR[rt]39..32)7..0 (GPR[rs]47..40 + GPR[rt]47..40)7..0 (GPR[rs]55..48 + GPR[rt]55..48)7..0 (GPR[rs]63..56 + GPR[rt]63..56)7..0 (GPR[rs]71..64 + GPR[rt]71..64)7..0 (GPR[rs]79..72 + GPR[rt]79..72)7..0 (GPR[rs]87..80 + GPR[rt]87..80)7..0 (GPR[rs]95..88 + GPR[rt]95..88)7..0 (GPR[rs]103..96 + GPR[rt]103..96)7..0 (GPR[rs]111..104 + GPR[rt]111. .104)7..0 (GPR[rs]119..112 + GPR[rt]119..112)7..0 (GPR[rs]127..120 + GPR[rt]127..120)7..0
16 15 87 0
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
A12 A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
87
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
+
+
+
+
+
+
+
+
+
+
+
+
+
+
16 15
+
+
B0
0
rt B15
B14 B13
B12 B11
B10
B9
B8
B7
B6
B5
B4
B3
B2
B1
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
A2 + B2 A1 + B1
87
A0 + B0
0
rd
A15 + B15
A14 + B14
A13 + B13
A12 + B12
A11 + B11
A10 + B10
A9 + B9
A8 + B8
A7 + B7
A6 + B6
A5 + B5
A4 + B4
A3 + B3
Exceptions:
None
B-29
Appendix B C790-Specific Instruction Set Details
PADDH
31 26 25 21 20
Parallel Add Halfword 16 15 11 10 65
PADDH
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDH 00100
5
MMI0 001000
6
C790
Format: Purpose: Description:
PADDH rd, rs, rt To add 8 pairs of 16-bit integers in parallel. rd rs + rt
The eight halfword values in GPR rs are added to the corresponding eight halfword values in GPR rt in parallel. The results are placed into the corresponding eight halfwords in GPR rd. No overflow or underflow exceptions are generated under any circumstances. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
(GPR[rs]15..0 + GPR[rt]15..0)15..0 (GPR[rs]31..16 + GPR[rt]31..16)15..0 (GPR[rs]47..32 + GPR[rt]47..32)15..0 (GPR[rs]63..48 + GPR[rt]63..48)15..0 (GPR[rs]79..64 + GPR[rt]79..64)15..0 (GPR[rs]95..80 + GPR[rt]95..80)15..0 (GPR[rs]111..96 + GPR[rt]111..96)15..0 (GPR[rs]127..112 + GPR[rt]127..112)15..0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rs
127
A7 +
112 111
A6 +
96 95
A5 +
80 79
A4 +
64 63
A3 +
48 47
A2 +
32 31
A1 +
16 15
A0 +
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7+B7
A6+B6
A5+B5
A4+B4
A3+B3
A2+B2
A1+B1
A0+B0
Exceptions:
None
B-30
Appendix B C790-Specific Instruction Set Details
PADDSB
31 26 25
Parallel Add with Signed saturation Byte 21 20 16 15 11 10 65
PADDSB
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDSB 11000
5
MMI0 001000
6
C790
Format: Purpose: Description:
PADDSB rd, rs, rt To add 16 pairs of 8-bit signed integers with saturation in parallel. rd rs + rt
The sixteen signed byte values in GPR rs are added to the corresponding sixteen signed byte values in GPR rt in parallel. The results are placed into the corresponding sixteen bytes in GPR rd. No overflow or underflow exceptions are generated under any circumstances. Results beyond the range of a signed byte value are saturated according to the following: Overflow: Underflow: 0x7F 0x80
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]7..0 + GPR[rt]7..0) > 0x7F) then GPR[rd]7..0 0x7F else if (0x100 <= (GPR[rs]7..0 + GPR[rt]7..0) < 0x180) then GPR[rd]7..0 0x80 else GPR[rd]7..0 (GPR[rs]7..0 + GPR[rt]7..0)7..0 endif if ((GPR[rs]15..8 + GPR[rt]15..8) > 0x7F) then 0x7F GPR[rd]15..8 else if (0x100 <= (GPR[rs]15..8 + GPR[rt]15..8) < 0x180) then 0x80 GPR[rd]15..8 else (GPR[rs]15..8 + GPR[rt]15..8)7..0 GPR[rd]15..8 endif if ((GPR[rs]23..16 + GPR[rt]23..16) > 0x7F) then 0x7F GPR[rd]23..16 else if (0x100 <= (GPR[rs]23..16 + GPR[rt]23..16) < 0x180) then 0x80 GPR[rd]23..16 else (GPR[rs]23..16 + GPR[rt]23..16)7..0 GPR[rd]23..16 endif if ((GPR[rs]31..24 + GPR[rt]31..24) > 0x7F) then 0x7F GPR[rd]31..24 else if (0x100 <= (GPR[rs]31..24 + GPR[rt]31..24) < 0x180) then
B-31
Appendix B C790-Specific Instruction Set Details
GPR[rd]31..24 else GPR[rd]31..24 endif 0x80 (GPR[rs]31..24 + GPR[rt]31..24)7..0
if ((GPR[rs]39..32 + GPR[rt]39..32) > 0x7F) then 0x7F GPR[rd]39..32 else if (0x100 <= (GPR[rs]39..32 + GPR[rt]39..32) < 0x180) then 0x80 GPR[rd]39..32 else (GPR[rs]39..32 + GPR[rt]39..32)7..0 GPR[rd]39..32 endif if ((GPR[rs]47..40 + GPR[rt]47..40) > 0x7F) then 0x7F GPR[rd]47..40 else if (0x100 <= (GPR[rs]47..40 + GPR[rt]47..40) < 0x180) then 0x80 GPR[rd]47..40 else (GPR[rs]47..40 + GPR[rt]47..40)7..0 GPR[rd]47..40 endif if ((GPR[rs]55..48 + GPR[rt]55..48) > 0x7F) then 0x7F GPR[rd]55..48 else if (0x100 <= (GPR[rs]55..48 + GPR[rt]55..48) < 0x180) then 0x80 GPR[rd]55..48 else (GPR[rs]55..48 + GPR[rt]55..48)7..0 GPR[rd]55..48 endif if ((GPR[rs]63..56 + GPR[rt]63..56) > 0x7F) then 0x7F GPR[rd]63..56 else if (0x100 <= (GPR[rs]63..56 + GPR[rt]63..56) < 0x180) then 0x80 GPR[rd]63..56 else (GPR[rs]63..56 + GPR[rt]63..56)7..0 GPR[rd]63..56 endif if ((GPR[rs]71..64 + GPR[rt]71..64) > 0x7F) then 0x7F GPR[rd]71..64 else if (0x100 <= (GPR[rs]71..64 + GPR[rt]71..64) < 0x180) then 0x80 GPR[rd]71..64 else (GPR[rs]71..64 + GPR[rt]71..64)7..0 GPR[rd]71..64 endif if ((GPR[rs]79..72 + GPR[rt]79..72) > 0x7F) then 0x7F GPR[rd]79..72 else if (0x100 <= (GPR[rs]79..72 + GPR[rt]79..72) < 0x180) then 0x80 GPR[rd]79..72 else (GPR[rs]79..72 + GPR[rt]79..72)7..0 GPR[rd]79..72 endif if ((GPR[rs]87..80 + GPR[rt]87..80) > 0x7F) then 0x7F GPR[rd]87..80
B-32
Appendix B C790-Specific Instruction Set Details
else if (0x100 <= (GPR[rs]87..80 + GPR[rt]87..80) < 0x180) then 0x80 GPR[rd]87..80 else (GPR[rs]87..80 + GPR[rt]87..80)7..0 GPR[rd]87..80 endif if ((GPR[rs]95..88 + GPR[rt]95..88) > 0x7F) then 0x7F GPR[rd]95..88 else if (0x100 <= (GPR[rs]95..88 + GPR[rt]95..88) < 0x180) then 0x80 GPR[rd]95..88 else (GPR[rs]95..88 + GPR[rt]95..88)7..0 GPR[rd]95..88 endif if ((GPR[rs]103..96 + GPR[rt]103..96) > 0x7F) then 0x7F GPR[rd]103..96 else if (0x100 <= (GPR[rs]103..96 + GPR[rt]103..96) < 0x180) then 0x80 GPR[rd]103..96 else (GPR[rs]103..96 + GPR[rt]103..96)7..0 GPR[rd]103..96 endif if ((GPR[rs]111..104 + GPR[rt]111..104) > 0x7F) then 0x7F GPR[rd]111..104 else if (0x100 <= (GPR[rs]111..104 + GPR[rt]111..104) < 0x180) then 0x80 GPR[rd]111..104 else (GPR[rs]111..104 + GPR[rt]111..104)7..0 GPR[rd]111..104 endif if ((GPR[rs]119..112 + GPR[rt]119..112) > 0x7F) then 0x7F GPR[rd]119..112 else if (0x100 <= (GPR[rs]119..112 + GPR[rt]119..112) < 0x180) then 0x80 GPR[rd]119..112 else (GPR[rs]119..112 + GPR[rt]119..112)7..0 GPR[rd]119..112 endif if ((GPR[rs]127..120 + GPR[rt]127..120) > 0x7F) then 0x7F GPR[rd]127..120 else if (0x100 <= (GPR[rs]127..120 + GPR[rt]127..120) < 0x180) then 0x80 GPR[rd]127..120 else (GPR[rs]127..120 + GPR[rt]127..120)7..0 GPR[rd]127..120 endif
B-33
Appendix B C790-Specific Instruction Set Details
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23 16 15
87
0
rs A15
A14
A13
A12
A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
87
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23 16 15
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
B0
0
rt B15
B14 B13
B12
B11
B10
B9
B8
B7
B6
B5
B4
B3
B2
B1
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23 16 15
87
A1 + B1 A0 + B0
0
rd*
A15 + B15
A14 + B14
A13 + B13
A12 + B12
A11 + B11
A10 + B10
A9 + B9
A8 + B8
A7 + B7
A6 + B6
A5 + B5
A4 + B4
A3 + B3
A2 + B2
* Saturate to signed byte
Exceptions:
None
B-34
Appendix B C790-Specific Instruction Set Details
PADDSH
31 26 25
Parallel Add with Signed saturation Halfword 21 20 16 15 11 10 65
PADDSH
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDSH 10100
5
MMI0 001000
6
C790
Format: Purpose: Description:
PADDSH rd, rs, rt To add 8 pairs of 16-bit signed integers with saturation in parallel. rd rs + rt
The eight signed halfword values in GPR rs are added to the corresponding eight signed halfword values in GPR rt in parallel. The results are placed into the corresponding eight halfwords in GPR rd. No overflow or underflow exceptions are generated under any circumstances. Results beyond the range of a signed halfword value are saturated according to the following: Overflow: Underflow: 0x7FFF 0x8000
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]15..0 + GPR[rt]15..0) > 0x7FFF) then 0x7FFF GPR[rd]15..0 else if (0x10000 <= (GPR[rs]15..0 + GPR[rt]15..0) < 0x18000) then 0x8000 GPR[rd]15..0 else (GPR[rs]15..0 + GPR[rt]15..0)15..0 GPR[rd]15..0 endif if ((GPR[rs]31..16 + GPR[rt]31..16) > 0x7FFF) then 0x7FFF GPR[rd]31..16 else if (0x10000 <= (GPR[rs]31..16 + GPR[rt]31..16) < 0x18000) then 0x8000 GPR[rd]31..16 else (GPR[rs]31..16 + GPR[rt]31..16)15..0 GPR[rd]31..16 endif if ((GPR[rs]47..32 + GPR[rt]47..32) > 0x7FFF) then 0x7FFF GPR[rd]47..32 else if (0x10000 <= (GPR[rs]47..32 + GPR[rt]47..32) < 0x18000) then 0x8000 GPR[rd]47..32 else (GPR[rs]47..32 + GPR[rt]47..32)15..0 GPR[rd]47..32 endif
B-35
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]63..48 + GPR[rt]63..48) > 0x7FFF) then 0x7FFF GPR[rd]63..48 else if (0x10000 <= (GPR[rs]63..48 + GPR[rt]63..48) < 0x18000) then 0x8000 GPR[rd]63..48 else (GPR[rs]63..48 + GPR[rt]63..48)15..0 GPR[rd]63..48 endif if ((GPR[rs]79..64 + GPR[rt]79..64) > 0x7FFF) then 0x7FFF GPR[rd]79..64 else if (0x10000 <= (GPR[rs]79..64 + GPR[rt]79..64) < 0x18000) then 0x8000 GPR[rd]79..64 else (GPR[rs]79..64 + GPR[rt]79..64)15..0 GPR[rd]79..64 endif if ((GPR[rs]95..80 + GPR[rt]95..80) > 0x7FFF) then 0x7FFF GPR[rd]95..80 else if (0x10000 <= (GPR[rs]95..80 + GPR[rt]95..80) < 0x18000) then 0x8000 GPR[rd]95..80 else (GPR[rs]95..80 + GPR[rt]95..80)15..0 GPR[rd]95..80 endif if ((GPR[rs]111..96 + GPR[rt]111..96) > 0x7FFF) then 0x7FFF GPR[rd]111..96 else if (0x10000 <= (GPR[rs]111..96 + GPR[rt]111..96) < 0x18000) then 0x8000 GPR[rd]111..96 else (GPR[rs]111..96 + GPR[rt]111..96)15..0 GPR[rd]111..96 endif if ((GPR[rs]127..112 + GPR[rt]127..112) > 0x7FFF) then 0x7FFF GPR[rd]127..112 else if (0x10000 <= (GPR[rs]127..112 + GPR[rt]127..112) < 0x18000) then 0x8000 GPR[rd]127..112 else (GPR[rs]127..112 + GPR[rt]127..112)15..0 GPR[rd]127..112 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 +
A6 +
112 111 96 95
A5 +
80 79
A4 + B4 +
64 63
A3
48 47
A2 + +
32 31
A1 +
16 15
A0
0
rt
B7
B6
B5
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd*
A7+B7
A6+B6
A5+B5
A4+B4
A3+B3
A2+B2
A1+B1
A0+B0
* Saturate to signed halfword Exceptions:
None B-36
Appendix B C790-Specific Instruction Set Details
PADDSW
31 26 25
Parallel Add with Signed saturation Word 21 20 16 15 11 10 65
PADDSW
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDSW 10000
5
MMI0 001000
6 C790
Format: Purpose: Description:
PADDSW rd, rs, rt To add 4 pairs of 32-bit signed integers with saturation in parallel. rd rs + rt
The four signed word values in GPR rs are added to the corresponding four signed word values in GPR rt in parallel. The results are placed into to the corresponding four words in GPR rd. No overflow or underflow exceptions are generated under any circumstances. Results beyond the range of a signed word value are saturated according to the following: Overflow: Underflow: 0x7FFFFFFF 0x80000000
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]31..0 + GPR[rt]31..0) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]31..0 else if (0x100000000 <= (GPR[rs]31..0 + GPR[rt]31..0) < 0x180000000) then 0x80000000 GPR[rd]31..0 else (GPR[rs]31..0 + GPR[rt]31..0)31..0 GPR[rd]31..0 endif if ((GPR[rs]63..32 + GPR[rt]63..32) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]63..32 else if (0x100000000 <= (GPR[rs]63..32 + GPR[rt]63..32) < 0x180000000) then 0x80000000 GPR[rd]63..32 else (GPR[rs]63..32 + GPR[rt]63..32)31..0 GPR[rd]63..32 endif if ((GPR[rs]95..64 + GPR[rt]95..64) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]95..64 else if (0x100000000 <= (GPR[rs]95..64 + GPR[rt]95..64) < 0x180000000) then 0x80000000 GPR[rd]95..64 else (GPR[rs]95..64 + GPR[rt]95..64)31..0 GPR[rd]95..64 endif
B-37
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]127..96 + GPR[rt]127..96) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]127..96 else if (0x100000000 <= (GPR[rs]127..96 + GPR[rt]127..96) < 0x180000000) then 0x80000000 GPR[rd]127..96 else (GPR[rs]127..96 + GPR[rt]127..96)31..0 GPR[rd]127..96 endif
127 96 95 64 63 32 31 0
rs
127
A3
A2
A1
A0
+
96 95
+
64 63
+
32 31
+
0
rt
B3
B2
B1
B0
127
96 95
64 63
32 31
0
rd*
A3+B3
A2+B2
A1+B1
A0+B0
* Saturate to signed word
Exceptions:
None
B-38
Appendix B C790-Specific Instruction Set Details
PADDUB
31 26 25
Parallel Add with Unsigned saturation Byte 21 20 16 15 11 10 65
PADDUB
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDUB 11000
5
MMI1 101000
6
C790
Format: Purpose: Description:
PADDUB rd, rs, rt To add 16 pairs of 8-bit unsigned integers with saturation in parallel. rd rs + rt
The sixteen unsigned byte values in GPR rs are added to the corresponding sixteen unsigned byte values in GPR rt in parallel. The results are placed into the corresponding sixteen bytes in GPR rd. No overflow exceptions are generated under any circumstances. Results beyond the range of an unsigned byte value are saturated according to the following: Overflow: 0xFF
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]7..0 + GPR[rt]7..0) > 0xFF) then GPR[rd]7..0 0xFF else GPR[rd]7..0 (GPR[rs]7..0 + GPR[rt]7..0)7..0 endif if ((GPR[rs]15..8 + GPR[rt]15..8) > 0xFF) then 0xFF GPR[rd]15..8 else GPR[rd]15..8 (GPR[rs]15..8 + GPR[rt]15..8)7..0 endif if ((GPR[rs]23..16 + GPR[rt]23..16) > 0xFF) then 0xFF GPR[rd]23..16 else GPR[rd]23..16 (GPR[rs]23..16 + GPR[rt]23..16)7..0 endif if ((GPR[rs]31..24 + GPR[rt]31..24) > 0xFF) then 0xFF GPR[rd]31..24 else GPR[rd]31..24 (GPR[rs]31..24 + GPR[rt]31..24)7..0 endif if ((GPR[rs]39..32 + GPR[rt]39..32) > 0xFF) then 0xFF GPR[rd]39..32 else GPR[rd]39..32 (GPR[rs]39..32 + GPR[rt]39..32)7..0 endif
B-39
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]47..40 + GPR[rt]47..40) > 0xFF) then 0xFF GPR[rd]47..40 else GPR[rd]47..40 (GPR[rs]47..40 + GPR[rt]47..40)7..0 endif if ((GPR[rs]55..48 + GPR[rt]55..48) > 0xFF) then 0xFF GPR[rd]55..48 else GPR[rd]55..48 (GPR[rs]55..48 + GPR[rt]55..48)7..0 endif if ((GPR[rs]63..56 + GPR[rt]63..56) > 0xFF) then 0xFF GPR[rd]63..56 else GPR[rd]63..56 (GPR[rs]63..56 + GPR[rt]63..56)7..0 endif if ((GPR[rs]71..64 + GPR[rt]71..64) > 0xFF) then 0xFF GPR[rd]71..64 else GPR[rd]71..64 (GPR[rs]71..64 + GPR[rt]71..64)7..0 endif if ((GPR[rs]79..72 + GPR[rt]79..72) > 0xFF) then 0xFF GPR[rd]79..72 else GPR[rd]79..72 (GPR[rs]79..72 + GPR[rt]79..72)7..0 endif if ((GPR[rs]87..80 + GPR[rt]87..80) > 0xFF) then 0xFF GPR[rd]87..80 else GPR[rd]87..80 (GPR[rs]87..80 + GPR[rt]87..80)7..0 endif if ((GPR[rs]95..88 + GPR[rt]95..88) > 0xFF) then 0xFF GPR[rd]95..88 else GPR[rd]95..88 (GPR[rs]95..88 + GPR[rt]95..88)7..0 endif if ((GPR[rs]103..96 + GPR[rt]103..96) > 0xFF) then 0xFF GPR[rd]103..96 else GPR[rd]103..96 (GPR[rs]103..96 + GPR[rt]103..96)7..0 endif if ((GPR[rs]111..104 + GPR[rt]111..104) > 0xFF) then 0xFF GPR[rd]111..104 else GPR[rd]111..104 (GPR[rs]111..104 + GPR[rt]111..104)7..0 endif if ((GPR[rs]119..112 + GPR[rt]119..112) > 0xFF) then
B-40
Appendix B C790-Specific Instruction Set Details
GPR[rd]119..112 else GPR[rd]119..112 endif 0xFF (GPR[rs]119..112 + GPR[rt]119..112)7..0
if ((GPR[rs]127..120 + GPR[rt]127..120) > 0xFF) then 0xFF GPR[rd]127..120 else GPR[rd]127..120 (GPR[rs]127..120 + GPR[rt]127..120)7..0 endif
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
87
0
rs A15
A14
A13
A12
A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
87
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
+
+
+
+
+
+
+
+
+
+
+
+
+
+
16 15
+
+
B0
0
rt B15
B14 B13
B12
B11
B10
B9
B8
B7
B6
B5
B4
B3
B2
B1
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
A2 + B2 A1 + B1
87
A0 + B0
0
rd*
A15 + B15
A14 + B14
A13 + B13
A12 + B12
A11 + B11
A10 + B10
A9 + B9
A8 + B8
A7 + B7
A6 + B6
A5 + B5
A4 + B4
A3 + B3
* Saturate to unsigned byte
Exceptions:
None
B-41
Appendix B C790-Specific Instruction Set Details
PADDUH
31 26 25
Parallel Add with Unsigned saturation Halfword 21 20 16 15 11 10 65
PADDUH
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDUH 10100
5
MMI1 101000
6
C790
Format: Purpose: Description:
PADDUH rd, rs, rt To add 8 pairs of 16-bit unsigned integers with saturation in parallel. rd rs + rt
The eight unsigned halfword values in GPR rs are added to the corresponding eight unsigned halfword values in GPR rt in parallel. The results are placed into the corresponding eight halfwords in GPR rd. No overflow exceptions are generated under any circumstances. Results beyond the range of an unsigned halfword value are saturated according to the following: Overflow: 0xFFFF
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]15..0 + GPR[rt]15..0) > 0xFFFF) then 0xFFFF GPR[rd]15..0 else GPR[rd]15..0 (GPR[rs]15..0 + GPR[rt]15..0)15..0 endif if ((GPR[rs]31..16 + GPR[rt]31..16) > 0xFFFF) then 0xFFFF GPR[rd]31..16 else GPR[rd]31..16 (GPR[rs]31..16 + GPR[rt]31..16)15..0 endif if ((GPR[rs]47..32 + GPR[rt]47..32) > 0xFFFF) then 0xFFFF GPR[rd]47..32 else GPR[rd]47..32 (GPR[rs]47..32 + GPR[rt]47..32)15..0 endif if ((GPR[rs]63..48 + GPR[rt]63..48) > 0xFFFF) then 0xFFFF GPR[rd]63..48 else GPR[rd]63..48 (GPR[rs]63..48 + GPR[rt]63..48)15..0 endif
B-42
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]79..64 + GPR[rt]79..64) > 0xFFFF) then 0xFFFF GPR[rd]79..64 else GPR[rd]79..64 (GPR[rs]79..64 + GPR[rt]79..64)15..0 endif if ((GPR[rs]95..80 + GPR[rt]95..80) > 0xFFFF) then 0xFFFF GPR[rd]95..80 else GPR[rd]95..80 (GPR[rs]95..80 + GPR[rt]95..80)15..0 endif if ((GPR[rs]111..96 + GPR[rt]111..96) > 0xFFFF) then 0xFFFF GPR[rd]111..96 else GPR[rd]111..96 (GPR[rs]111..96 + GPR[rt]111..96)15..0 endif if ((GPR[rs]127..112 + GPR[rt]127..112) > 0xFFFF) then 0xFFFF GPR[rd]127..112 else GPR[rd]127..112 (GPR[rs]127..112 + GPR[rt]127..112)15..0 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 +
112 111
A6 +
96 95
A5 +
80 79
A4 +
64 63
A3 +
48 47
A2 +
32 31
A1 +
16 15
A0 +
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd*
A7+B7
A6+B6
A5+B5
A4+B4
A3+B3
A2+B2
A1+B1
A0+B0
* Saturate to unsigned halfword
Exceptions:
None
B-43
Appendix B C790-Specific Instruction Set Details
PADDUW
31 26 25
Parallel Add with Unsigned saturation Word 21 20 16 15 11 10 65
PADDUW
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDUW 10000
5
MMI1 101000
6 C790
Format: Purpose: Description:
PADDUW rd, rs, rt To add 4 pairs of 32-bit unsigned integers with saturation in parallel. rd rs + rt
The four unsigned word values in GPR rs are added to the corresponding four unsigned word values in GPR rt in parallel. The results are placed into the corresponding four words in GPR rd. No overflow exceptions are generated under any circumstances. Results beyond the range of an unsigned word value are saturated according to the following: Overflow: 0xFFFFFFFF
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]31..0 + GPR[rt]31..0) > 0xFFFFFFFF) then 0xFFFFFFFF GPR[rd]31..0 else (GPR[rs]31..0 + GPR[rt]31..0)31..0 GPR[rd]31..0 endif if ((GPR[rs]63..32 + GPR[rt]63..32) > 0xFFFFFFFF) then 0xFFFFFFFF GPR[rd]63..32 else (GPR[rs]63..32 + GPR[rt]63..32)31..0 GPR[rd]63..32 endif if ((GPR[rs]95..64 + GPR[rt]95..64) > 0xFFFFFFFF) then 0xFFFFFFFF GPR[rd]95..64 else (GPR[rs]95..64 + GPR[rt]95..64)31..0 GPR[rd]95..64 endif if ((GPR[rs]127..96 + GPR[rt]127..96) > 0xFFFFFFFF) then 0xFFFFFFFF GPR[rd]127..96 else (GPR[rs]127..96 + GPR[rt]127..96)31..0 GPR[rd]127..96 endif
B-44
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
A2
96 95
A1
64 63
A0
32 31
+
B3
+
B2
+
B1
+
B0
0
rt
127
96 95
64 63
32 31
0
rd*
A3+B3
A2+B2 * Saturate to unsigned word
A1+B1
A0+B0
Exceptions:
None
B-45
Appendix B C790-Specific Instruction Set Details
PADDW
31 26 25 21 20
Parallel Add Word 16 15 11 10 65
PADDW
0
MMI 011100
6
rs
5
rt
5
rd
5
PADDW 00000
5
MMI0 001000
6
C790
Format: Purpose: Description:
PADDW rd, rs, rt To add 4 pairs of 32-bit integers in parallel. rd rs + rt
The four word values in GPR rs are added to the corresponding four word values in GPR rt in parallel. The results are placed into the corresponding four words in GPR rd. No overflow or underflow exceptions are generated under any circumstances. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
(GPR[rs]31..0 + GPR[rt]31..0)31..0 (GPR[rs]63..32 + GPR[rt]63..32)31..0 (GPR[rs]95..64 + GPR[rt]95..64)31..0 (GPR[rs]127..96 + GPR[rt]127..96)31..0
96 95 64 63 32 31 0
rs
127
A3
A2
96 95
A1
64 63
A0
32 31
+
B3
+
B2
+
B1
+
B0
0
rt
127
96 95
64 63
32 31
0
rd
A3+B3
A2+B2
A1+B1
A0+B0
Exceptions:
None
B-46
Appendix B C790-Specific Instruction Set Details
PADSBH
31 26 25 21 20
Parallel Add/Subtract Halfword 16 15 11 10 65
PADSBH
0
MMI 011100
6
rs
5
rt
5
rd
5
PADSBH 00100
5
MMI1 101000
6 C790
Format: Purpose: Description:
PADSBH rd, rs, rt To add/subtract 8 pairs of 16-bit integers in parallel. rd rs +/- rt
The high-order four halfword values in GPR rs are added to the corresponding four halfword values in GPR rt and the low-order four halfword values in GPR rt are subtracted from the corresponding four halfword values in GPR rs in parallel. The results are placed into the corresponding eight halfword values in GPR rd. No overflow or underflow exceptions are generated under any circumstances. This instruction operates on 128-bit registers.
Operation
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
(GPR[rs]15..0 - GPR[rt]15..0)15..0 (GPR[rs]31..16 - GPR[rt]31..16)15..0 (GPR[rs]47..32 - GPR[rt]47..32)15..0 (GPR[rs]63..48 - GPR[rt]63..48)15..0 (GPR[rs]79..64 + GPR[rt]79..64)15..0 (GPR[rs]95..80 + GPR[rt]95..80)15..0 (GPR[rs]111..96 + GPR[rt]111..96)15..0 (GPR[rs]127..112 + GPR[rt]127..112)15..0
96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 +
112 111
A6 +
96 95
A5 +
80 79
A4 +
64 63
A3 -
48 47
A2 -
32 31
A1 -
16 15
A0 -
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7+B7
A6+B6
A5+B5
A4+B4
A3-B3
A2-B2
A1-B1
A0-B0
Exceptions:
None
B-47
Appendix B C790-Specific Instruction Set Details
PAND
31 26 25 21 20
Parallel And 16 15 11 10 65 0
PAND
PAND 10010
5
MMI 011100
6
rs
5
rt
5
rd
5
MMI2 001001
6
C790
Format: Purpose: Description:
PAND rd, rs, rt To perform a bitwise logical AND. rd rs AND rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical AND operation. The result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]127..0 GPR[rs]127..0 and GPR[rt]127..0
127 64 63 0
rs
127
A1 AND
64 63
A0 AND
0
rt
B1
B0
127
64 63
0
rd
A1 AND B1
A0 AND B0
Exceptions:
None
B-48
Appendix B C790-Specific Instruction Set Details
PCEQB
31 26 25 21 20
Parallel Compare for Equal Byte 16 15 11 10 65
PCEQB
0
MMI 011100
6
rs
5
rt
5
rd
5
PCEQB 01010
5
MMI1 101000
6 C790
Format: Purpose: Description:
PCEQB rd, rs, rt To record the result of 16 equality comparisons in parallel. rd (rs = rt)
The sixteen signed byte values in GPR rs are compared to the corresponding sixteen signed byte values in GPR rt, in parallel. The results of the comparison are placed into GPR rd as follows: If the signed byte value in GPR rs is equal to the corresponding signed byte value in GPR rt, then the corresponding byte in GPR rd is set to 0xFF otherwise it is set to 0x00. This instruction operates on 128-bit registers.
Operation:
if (GPR[rs]7..0 = GPR[rt]7..0) then GPR[rd]7..0 18 else GPR[rd]7..0 08 endif if (GPR[rs]15..8 = GPR[rt]15..8) then GPR[rd]15..8 18 else GPR[rd]15..8 08 endif if (GPR[rs]23..16 = GPR[rt]23..16) then GPR[rd]23..16 18 else GPR[rd]23..16 08 endif if (GPR[rs]31..24 = GPR[rt]31..24) then GPR[rd]31..24 18 else GPR[rd]31..24 08 endif
B-49
Appendix B C790-Specific Instruction Set Details
if (GPR[rs]39..32 = GPR[rt]39..32) then GPR[rd]39..32 18 else GPR[rd]39..32 08 endif if (GPR[rs]47..40 = GPR[rt]47..40) then GPR[rd]47..40 18 else GPR[rd]47..40 08 endif if (GPR[rs]55..48 = GPR[rt]55..48) then GPR[rd]55..48 18 else GPR[rd]55..48 08 endif if (GPR[rs]63..56 = GPR[rt]63..56) then GPR[rd]63..56 18 else GPR[rd]63..56 08 endif if (GPR[rs]71..64 = GPR[rt]71..64) then GPR[rd]71..64 18 else GPR[rd]71..64 08 endif if (GPR[rs]79..72 = GPR[rt]79..72) then GPR[rd]79..72 18 else GPR[rd]79..72 08 endif if (GPR[rs]87..80 = GPR[rt]87..80) then GPR[rd]87..80 18 else GPR[rd]87..80 08 endif if (GPR[rs]95..88 = GPR[rt]95..88) then GPR[rd]95..88 18 else GPR[rd]95..88 08 endif if (GPR[rs]103..96 = GPR[rt]103..96) then GPR[rd]103..96 18 else GPR[rd]103..96 08 endif if (GPR[rs]111..104 = GPR[rt]111..104) then
B-50
Appendix B C790-Specific Instruction Set Details
GPR[rd]111..104 18 else GPR[rd]111..104 08 endif if (GPR[rs]119..112 = GPR[rt]119..112) then GPR[rd]119..112 18 else GPR[rd]119..112 08 endif if (GPR[rs]127..120 = GPR[rt]127..120) then GPR[rd]127..120 18 else GPR[rd]127..120 08 endif
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
87
0
rs A15
A14
A13
A12 A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
=
rt B15
=
=
=
=
=
B10
=
B9
=
B8
=
B7
=
B6
=
B5
=
B4
=
B3
=
B2
=
16 15 87
=
0
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
B14 B13
B12 B11
B1
B0
False
8
True
8
True
8
True
8
True
8
False
8
False
8
True
8
False
8
True
8
True
8
True
8
True
8
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
False False True 16 15 8 7 0
rd
0
1
1
1
1
0
0
1
0
1
1
1
1
0
8
0
8
1
8
Exceptions:
None
B-51
Appendix B C790-Specific Instruction Set Details
PCEQH
31 26 25
Parallel Compare for Equal Halfword 21 20 16 15 11 10 65
PCEQH
0
MMI 011100
6
rs
5
rt
5
rd
5
PCEQH 00110
5
MMI1 101000
6
C790
Format: Purpose: Description:
PCEQH rd, rs, rt To record the results of 8 equality comparisons in parallel. rd (rs = rt)
The eight signed halfword values in GPR rs are compared to the corresponding eight signed halfword values in GPR rt, in parallel. The results of the comparison are placed into GPR rd as follows: If the signed halfword value in GPR rs is equal to the corresponding signed halfword value in GPR rt, then the corresponding halfword in GPR rd is set to 0xFFFF otherwise it is set to 0x0000. This instruction operates on 128-bit registers.
Operation:
if (GPR[rs]15..0 = GPR[rt]15..0) then GPR[rd]15..0 116 else GPR[rd]15..0 016 endif if (GPR[rs]31..16 = GPR[rt]31..16) then GPR[rd]31..16 116 else GPR[rd]31..16 016 endif if (GPR[rs]47..32 = GPR[rt]47..32) then GPR[rd]47..32 116 else GPR[rd]47..32 016 endif if (GPR[rs]63..48 = GPR[rt]63..48) then GPR[rd]63..48 116 else GPR[rd]63..48 016 endif
B-52
Appendix B C790-Specific Instruction Set Details
if (GPR[rs]79..64 = GPR[rt]79..64) then GPR[rd]79..64 116 else GPR[rd]79..64 016 endif if (GPR[rs]95..80 = GPR[rt]95..80) then GPR[rd]95..80 116 else GPR[rd]95..80 016 endif if (GPR[rs]111..96 = GPR[rt]111..96) then GPR[rd]111..96 116 else GPR[rd]111..96 016 endif if (GPR[rs]127..112 = GPR[rt]127..112) then GPR[rd]127..112 116 else GPR[rd]127..112 016 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7
A6
112 111
A5
96 95
A4
80 79
A3
64 63
A2
48 47
A1
32 31
A0
16 15
=
B7
=
B6
=
B5
=
B4
=
B3
=
B2
=
B1
=
B0
0
rt
False
127
True
96 95
False
80 79
True
64 63
False
48 47
True 1
16
False
32 31 16 15
True
0
112 111
rd
0
16
1
16
0
16
1
16
0
16
0
16
1
16
Exceptions:
None
B-53
Appendix B C790-Specific Instruction Set Details
PCEQW
31 26 25
Parallel Compare for Equal Word 21 20 16 15 11 10 65
PCEQW
0
MMI 011100
6
rs
5
rt
5
rd
5
PCEQW 00010
5
MMI1 101000
6
C790
Format: Purpose: Description:
PCEQW rd, rs, rt To record the result of 4 equality comparisons in parallel. rd (rs = rt)
The four signed word values in GPR rs are compared to the corresponding four signed word values in GPR rt, in parallel. The results of the comparison are placed into GPR rd as follows: If the signed word value in GPR rs is equal to the corresponding signed word value in GPR rt, then the corresponding word in GPR rd is set to 0xFFFFFFFF otherwise it is set to 0x00000000. This instruction operates on 128-bit registers.
Operation:
if (GPR[rs]31..0 = GPR[rt]31..0) then GPR[rd]31..0 132 else GPR[rd]31..0 032 endif if (GPR[rs]63..32 = GPR[rt]63..32) then GPR[rd]63..32 132 else GPR[rd]63..32 032 endif if (GPR[rs]95..64 = GPR[rt]95..64) then GPR[rd]95..64 132 else GPR[rd]95..64 032 endif if (GPR[rs]127..96 = GPR[rt]127..96) then GPR[rd]127..96 132 else GPR[rd]127..96 032 endif
B-54
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
A2
96 95
A1
64 63
A0
32 31
=
B3
=
B2
=
B1
=
B0
0
rt
127
False 0
32
96 95
True 1
32
64 63
False 0
32
32 31
True 1
32
0
rd
Exceptions:
None
B-55
Appendix B C790-Specific Instruction Set Details
PCGTB
31 26 25
Parallel Compare for Greater Than Byte 21 20 16 15 11 10 65
PCGTB
0
MMI 011100
6
rs
5
rt
5
rd
5
PCGTB 01010
5
MMI0 001000
6
C790
Format: Purpose: Description:
PCGTB rd, rs, rt To record the result of 16 greater-than comparisons in parallel. rd (rs > rt)
The sixteen signed byte values in GPR rs are compared to the corresponding sixteen signed byte values in GPR rt in parallel. The results of the comparison are placed into GPR rd as follows: If the signed byte value in GPR rs is greater than the corresponding signed byte value in GPR rt, then the corresponding byte in GPR rd is set to 0xFF otherwise it is set to 0x00. This instruction operates on 128-bit registers.
Operation:
if (GPR[rs]7..0 > GPR[rt]7..0) then GPR[rd]7..0 18 else GPR[rd]7..0 08 endif if (GPR[rs]15..8 > GPR[rt]15..8) then GPR[rd]15..8 18 else GPR[rd]15..8 08 endif if (GPR[rs]23..16 > GPR[rt]23..16) then GPR[rd]23..16 18 else GPR[rd]23..16 08 endif if (GPR[rs]31..24 > GPR[rt]31..24) then GPR[rd]31..24 18 else GPR[rd]31..24 08 endif
B-56
Appendix B C790-Specific Instruction Set Details
if (GPR[rs]39..32 > GPR[rt]39..32) then GPR[rd]39..32 18 else GPR[rd]39..32 08 endif if (GPR[rs]47..40 > GPR[rt]47..40) then GPR[rd]47..40 18 else GPR[rd]47..40 08 endif if (GPR[rs]55..48 > GPR[rt]55..48) then GPR[rd]55..48 18 else GPR[rd]55..48 08 endif if (GPR[rs]63..56 > GPR[rt]63..56) then GPR[rd]63..56 18 else GPR[rd]63..56 08 endif if (GPR[rs]71..64 > GPR[rt]71..64) then GPR[rd]71..64 18 else GPR[rd]71..64 08 endif if (GPR[rs]79..72 > GPR[rt]79..72) then GPR[rd]79..72 18 else GPR[rd]79..72 08 endif if (GPR[rs]87..80 > GPR[rt]87..80) then GPR[rd]87..80 18 else GPR[rd]87..80 08 endif if (GPR[rs]95..88 > GPR[rt]95..88) then GPR[rd]95..88 18 else GPR[rd]95..88 08 endif
B-57
Appendix B C790-Specific Instruction Set Details
if (GPR[rs]103..96 > GPR[rt]103..96) then GPR[rd]103..96 18 else GPR[rd]103..96 08 endif if (GPR[rs]111..104 > GPR[rt]111..104) then GPR[rd]111..104 18 else GPR[rd]111..104 08 endif if (GPR[rs]119..112 > GPR[rt]119..112) then GPR[rd]119..112 18 else GPR[rd]119..112 08 endif if (GPR[rs]127..120 > GPR[rt]127..120) then GPR[rd]127..120 18 else GPR[rd]127..120 08 endif
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
87
0
rs A15
A14
A13
A12 A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
87
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
>
>
>
>
>
>
>
>
>
>
>
>
>
>
16 15
>
>
B0
0
rt B15
B14 B13
B12 B11
B10
B9
B8
B7
B6
B5
B4
B3
B2
B1
True
8
False
8
False
8
False
8
False
8
True
8
False
8
False
8
True
8
False
8
False
8
False
8
False
8
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
True False False 16 15 8 7 0
rd
1
0
0
0
0
1
0
0
1
0
0
0
0
1
8
0
8
0
8
Exceptions:
None
B-58
Appendix B C790-Specific Instruction Set Details
PCGTH
31 26 25
Parallel Compare for Greater Than Halfword 21 20 16 15 11 10 65
PCGTH
0
MMI 011100
6
rs
5
rt
5
rd
5
PCGTH 00110
5
MMI0 001000
6
C790
Format: Purpose: Description:
PCGTH rd, rs, rt To record the results of 8 greater-than comparisons in parallel. rd (rs > rt)
The eight signed halfword values in GPR rs are compared to the corresponding eight signed halfword values in GPR rt in parallel. The results of the comparison are placed into GPR rd as follows: If the signed halfword value in GPR rs is greater than the corresponding signed halfword value in GPR rt, then the corresponding halfword in GPR rd is set to 0xFFFF otherwise it is set to 0x0000. This instruction operates on 128-bit registers.
Operation:
if (GPR[rs]15..0 > GPR[rt]15..0) then GPR[rd]15..0 116 else GPR[rd]15..0 016 endif if (GPR[rs]31..16 > GPR[rt]31..16) then GPR[rd]31..16 116 else GPR[rd]31..16 016 endif if (GPR[rs]47..32 > GPR[rt]47..32) then GPR[rd]47..32 116 else GPR[rd]47..32 016 endif if (GPR[rs]63..48 > GPR[rt]63..48) then GPR[rd]63..48 116 else GPR[rd]63..48 016 endif
B-59
Appendix B C790-Specific Instruction Set Details
if (GPR[rs]79..64 > GPR[rt]79..64) then GPR[rd]79..64 116 else GPR[rd]79..64 016 endif if (GPR[rs]95..80 > GPR[rt]95..80) then GPR[rd]95..80 116 else GPR[rd]95..80 016 endif if (GPR[rs]111..96 > GPR[rt]111..96) then GPR[rd]111..96 116 else GPR[rd]111..96 016 endif if (GPR[rs]127..112 > GPR[rt]127..112) then GPR[rd]127..112 116 else GPR[rd]127..112 016 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7
A6
112 111
A5
96 95
A4
80 79
A3
64 63
A2
48 47
A1
32 31
A0
16 15
>
B7
>
B6
>
B5
>
B4
>
B3
>
B2
>
B1
>
B0
0
rt
True
127
False
96 95
False 0
16
False
64 63
True 1
16
False
48 47
False
32 31
False
16 15 0
112 111
80 79
rd
1
16
0
16
0
16
0
16
0
16
0
16
Exceptions:
None
B-60
Appendix B C790-Specific Instruction Set Details
PCGTW
31 26 25
Parallel Compare for Greater Than Word 21 20 16 15 11 10 65
PCGTW
0
MMI 011100
6
rs
5
rt
5
rd
5
PCGTW 00010
5
MMI0 001000
6
C790
Format: Purpose: Description:
PCGTW rd, rs, rt To record the results of 4 greater-than comparisons in parallel. rd (rs > rt)
The four signed word values in GPR rs are compared to the corresponding four signed word values in GPR rt in parallel. The results of the comparison are placed into GPR rd as follows: If the signed word value in GPR rs is greater than the corresponding signed word value in GPR rt, then the corresponding word in GPR rd is set 0xFFFFFFFF otherwise it is set to 0x00000000. This instruction operates on 128-bit registers.
Operation:
if (GPR[rs]31..0 > GPR[rt]31..0) then GPR[rd]31..0 132 else GPR[rd]31..0 032 endif if (GPR[rs]63..32 > GPR[rt]63..32) then GPR[rd]63..32 132 else GPR[rd]63..32 032 endif if (GPR[rs]95..64 > GPR[rt]95..64) then GPR[rd]95..64 132 else GPR[rd]95..64 032 endif if (GPR[rs]127..96 > GPR[rt]127..96) then GPR[rd]127..96 132 else GPR[rd]127..96 032 endif
B-61
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
A2
96 95
A1
64 63
A0
32 31
>
B3
>
B2
>
B1
>
B0
0
rt
127
False 0
32
96 95
True 1
32
64 63
False 0
32
32 31
True 1
32
0
rd
Exception:
None
B-62
Appendix B C790-Specific Instruction Set Details
PCPYH
31 26 25 21 20
Parallel Copy Halfword 16 15 11 10 65
PCPYH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PCPYH 11011
5
MMI3 101001
6
C790
Format: Purpose: Description:
PCPYH rd, rt To copy halfword. rd copy (rt)
The contents of the low-order halfword of the two doublewords in GPR rt are copied to each of the halfwords of the two doublewords in GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127
GPR[rt]15..0 GPR[rt]15..0 GPR[rt]15..0 GPR[rt]15..0 GPR[rt]79..64 GPR[rt]79..64 GPR[rt]79..64 GPR[rt]79..64
80 79 64 63 16 15 0
rt
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A1
A1
A1
A1
A0
A0
A0
A0
Exceptions:
None
B-63
Appendix B C790-Specific Instruction Set Details
PCPYLD
31 26 25 21 20
Parallel Copy Lower Doubleword 16 15 11 10 65
PCPYLD
0
MMI 011100
6
rs
5
rt
5
rd
5
PCPYLD 01110
5
MMI2 001001
6
C790
Format: Purpose: Description:
PCPYLD rd, rs, rt To copy doubleword. rd copy (rs, rt)
The contents of the low-order doubleword in GPR rs are combined with the contents of the low-order doubleword in GPR rt. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]63..0 GPR[rt]63..0 GPR[rd]127..64 GPR[rs]63..0
127 64 63 0
rs
127 64 63
A0
0
rd
127
A0
64 63
B0
0
rt
B0
Exceptions:
None
B-64
Appendix B C790-Specific Instruction Set Details
PCPYUD
31 26 25 21 20
Parallel Copy Upper Doubleword 16 15 11 10 65
PCPYUD
0
MMI 011100
6
rs
5
rt
5
rd
5
PCPYUD 01110
5
MMI3 101001
6
C790
Format: Purpose: Description:
PCPYUD rd, rs, rt To copy doubleword. rd copy (rs, rt)
The contents of the high-order doubleword in GPR rs are combined with the contents of the high-order doubleword in GPR rt. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation
GPR[rd]63..0 GPR[rs]127..64 GPR[rd]127..64 GPR[rt]127..64
127 64 63 0
rs
127
A0
64 63 0
rd
127
B0
64 63
A0
0
rt
B0
Exceptions:
None
B-65
Appendix B C790-Specific Instruction Set Details
PDIVBW
31 26 25 21 20
Parallel Divide Broadcast Word 16 15 11 10 65
PDIVBW
0
MMI 011100
6
rs
5
rt
5
0 00000
5
PDIVBW 11101
5
MMI2 001001
6
C790
Format: Purpose: Description:
PDIVBW rs, rt To divide 4 32-bit signed integers by a 16-bit signed integer in parallel. (LO, HI) rs / rt
The four signed words in GPR rs are divided by the low-order signed halfword in GPR rt, in parallel. The four 32-bit quotients are placed into special register LO. The four 16-bit remainders are placed into special register HI. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If the divisor in GPR rt is zero, the arithmetic result value is undefined.
Operation:
q0 GPR[rs]31..0 div GPR[rt]15..0 r0 GPR[rs]31..0 mod GPR[rt]15..0 q1 GPR[rs]63..32 div GPR[rt]15..0 r1 GPR[rs]63..32 mod GPR[rt]15..0 q2 GPR[rs]95..64 div GPR[rt]15..0 r2 GPR[rs]95..64 mod GPR[rt]15..0 q3 GPR[rs]127..96 div GPR[rt]15..0 r3 GPR[rs]127..96 mod GPR[rt]15..0 q031..0 LO31..0 (r015)16 || r015..0 HI31..0 q131..0 LO63..32 (r115)16 || r115..0 HI63..32 q231..0 LO95..64 (r215)16 || r215..0 HI95..64 q331..0 LO127..96 (r315)16 || r315..0 HI127..96
B-66
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
A3
A2
A1
A0
/
127
/
/
/
16 15 0
rt
B0
127
96 95
64 63
32 31
0
HI
sign ext (A3 mod B0)
127
sign ext (A2 mod B0)
sign ext (A1 mod B0)
sign ext ( A0 mod B0)
0
96 95
64 63
32 31
LO
A3 div B0
A2 div B0
A1 div B0
A0 div B0
Supplementary explanation:
When 0x80000000 (-2147483648), the most negative value, is divided by 0xFFFF (-1), the operation will results in an overflow. However, overflow exception doesn't occur and the operation results in the following: Quotient is 0x80000000 (-2147483648), and remainder is 0x00000000 (0).
Exceptions:
None
Programming Notes:
In the C790 the integer divide operation proceeds asynchronously and allows other CPU instructions to execute before it is retired. An attempt to read LO or HI before the results are written will cause an interlock until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the divide so that other instructions can execute in parallel. No arithmetic exception occurs under any circumstances. If divide-by-zero or overflow conditions should be detected and some action taken, then the divide instruction is typically followed by additional instructions to check for a zero divisor and / or for overflow. If the divide is asynchronous then the zero-divisor check can execute in parallel with the divide. The action taken on either divide-by-zero or overflow is either a convention within the program itself or more typically, the system software; one possibility is to take a BREAK exception with a code field value to signal the problem to the system software. As an example, the C programming language in a UNIX environment expects division by zero to either terminate the program or execute a program-specified signal handler. C does not expect overflow to cause any exceptional condition. If the C compiler uses a divide instruction, it also emits code to test for a zero divisor and execute a BREAK instruction to inform the operating system if one is detected.
B-67
Appendix B C790-Specific Instruction Set Details
PDIVUW
31 26 25 21 20
Parallel Divide Unsigned Word 16 15 11 10 65
PDIVUW
0
MMI 011100
6
rs
5
rt
5
0 00000
5
PDIVUW 01101
5
MMI3 101001
6
C790
Format: Purpose: Description:
PDIVUW rs, rt To divide 2 pairs of 32-bit unsigned integers in parallel. (LO, HI) rs / rt
The low-order unsigned word of the two doublewords in GPR rs are divided by the loworder unsigned word of the two doublewords in GPR rt in parallel. The two 32 bit quotients are placed into special register LO. The two 32-bit remainders are placed into special register HI. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If neither GPR rt nor GPR rs contain a zero-extended 32-bit value (bits 127..96 and 63..32 equal zero), the result of the operation will be undefined. If the divisor in GPR rt is zero, the result will be undefined.
Operation:
if (NotWordValue(GPR[rs]) or NotWordValue(GPR[rt])) then UndefinedResult() endif (0 || GPR[rs]31..0) div (0 || GPR[rt]31..0) q0 (0 || GPR[rs]31..0) mod (0 || GPR[rt]31..0) r0 (0 || GPR[rs]95..64) div (0 || GPR[rt]95..64) q1 (0 || GPR[rs]95..64) mod (0 || GPR[rt]95..64) r1 LO63..0 (q0 31)32 || q031..0 HI63..0 (r0 31)32 || r031..0 LO127..64 (q1 31)32 || q131..0 HI127..64 (r1 31)32 || r131..0
127 96 95 64 63 32 31 0
rs
127 96 95
A1
A0
64 63 32 31
/
B1
/
B0
0
rt
127
96 95
64 63
32 31
0
HI
127
sign ext sign ext
(0 || A1) mod (0 || B1)
96 95 64 63
sign ext sign ext
(0 || A0) mod (0 || B0)
32 31 0
LO
(0 || A1) div (0 || B1)
(0 || A0) div (0 || B0)
B-68
Appendix B C790-Specific Instruction Set Details
Exceptions:
None
Programming Notes:
See the Programming Notes for the PDIVBW instruction.
B-69
Appendix B C790-Specific Instruction Set Details
PDIVW
31 26 25 21 20
Parallel Divide Word 16 15 11 10 65
PDIVW
0
MMI 011100
6
rs
5
rt
5
0 00000
5
PDIVW 01101
5
MMI2 001001
6
C790
Format: Purpose: Description:
PDIVW rs, rt To divide 2 pairs of 32-bit signed integers in parallel. (LO, HI) rs / rt
The low-order signed word of the two doublewords in GPR rs are divided by the low-order signed word of the two doublewords in GPR rt in parallel. The two 32 bit quotients are placed into special register LO. The two 32-bit remainders are placed into special register HI. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If neither GPR rt nor GPR rs contain a sign-extended 32-bit value (bits 127..95 equal and 63..31 equal), the result of the operation will be undefined. If the divisor in GPR rt is zero, the result will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif GPR[rs]31..0 div GPR[rt]31..0 q0 GPR[rs]31..0 mod GPR[rt]31..0 r0 GPR[rs]95..64 div GPR[rt]95..64 q1 GPR[rs]95..64 mod GPR[rt]95..64 r1 LO63..0 (q0 31)32 || q031..0 HI63..0 (r0 31)32 || r031..0 LO127..64 (q1 31)32 || q131..0 HI 127..64 (r1 31)32 || r131..0
127 96 95 64 63 32 31 0
rs
127 96 95
A1
A0
64 63 32 31
/
B1
/
B0
0
rt
127 96 95
64 63
32 31
0
HI
127
sign ext
96 95
A1 mod B1
64 63
sign ext
32 31
A0 mod B0
0
LO
sign ext
A1 div B1
sign ext
A0 div B0
B-70
Appendix B C790-Specific Instruction Set Details
Supplementary explanation:
When 0x80000000 (-2147483648), the most negative value, is divided by 0xFFFFFFFF (-1), the operation results in an overflow. However, overflow exception doesn't occur; the operation results in the followings: Quotient (q) is 0x80000000 (-2147483648), and remainder (r) is 0x00000000(0).
Exceptions:
None
Programming Notes:
See the Programming Notes for the PDIVBW instruction.
B-71
Appendix B C790-Specific Instruction Set Details
PEXCH
31 26 25
Parallel Exchange Center Halfword 21 20 16 15 11 10 65
PEXCH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PEXCH 11010
5
MMI3 101001
6
C790
Format: Purpose: Description:
PEXCH rd, rt To exchange halfwords. rd exchange (rt)
The two central halfwords of the high-order doubleword in GPR rt are exchanged and the two central halfwords of the low-order doubleword in GPR rt are exchanged. The results are copied to GPR rd while other halfwords are copied directly to the corresponding halfwords. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
GPR[rt]15..0 GPR[rt]47..32 GPR[rt]31..16 GPR[rt]63..48 GPR[rt]79..64 GPR[rt]111..96 GPR[rt]95..80 GPR[rt]127..112
96 95 80 79 64 63 48 47 32 31 16 15 0
127
112 111
rt
A7
A6
A5
A4
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7
A5
A6
A4
A3
A1
A2
A0
Exceptions:
None
B-72
Appendix B C790-Specific Instruction Set Details
PEXCW
31 26 25 21 20
Parallel Exchange Center Word 16 15 11 10 65
PEXCW
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PEXCW 11110
5
MMI3 101001
6
C790
Format: Purpose: Description:
PEXCW rd, rt To exchange words. rd exchange (rt)
The two central words in GPR rt are exchanged. The results are copied to GPR rd while other words are copied directly to the corresponding words. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
GPR[rt]31..0 GPR[rt]95..64 GPR[rt]63..32 GPR[rt]127..96
96 95 64 63 32 31 0
127
rt
A3
A2
A1
A0
127
96 95
64 63
32 31
0
rd
A3
A1
A2
A0
Exceptions:
None
B-73
Appendix B C790-Specific Instruction Set Details
PEXEH
31 26 25 21 20
Parallel Exchange Even Halfword 16 15 11 10 65
PEXEH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PEXEH 11010
5
MMI2 001001
6 C790
Format: Purpose: Description:
PEXEH rd, rt To exchange halfwords. rd exchange (rt)
The two low-order halfwords of the two words of the high-order doubleword in GPR rt are exchanged and the two low-order halfwords of the two words of the low-order doubleword in GPR rt are exchanged. The results are copied to GPR rd while other halfwords are copied directly to the corresponding halfwords. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
GPR[rt]47..32 GPR[rt]31..16 GPR[rt]15..0 GPR[rt]63..48 GPR[rt]111..96 GPR[rt]95..80 GPR[rt]79..64 GPR[rt]127..112
96 95 80 79 64 63 48 47 32 31 16 15 0
rt
A7
A6
A5
A4
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7
A4
A5
A6
A3
A0
A1
A2
Exceptions:
None
B-74
Appendix B C790-Specific Instruction Set Details
PEXEW
31 26 25 21 20
Parallel Exchange Even Word 16 15 11 10 65
PEXEW
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PEXEW 11110
5
MMI2 001001
6 C790
Format: Purpose: Description:
PEXEW rd, rt To exchange word. rd exchange (rt)
The two low-order words of the two doublewords in GPR rt are exchanged. The results are copied to GPR rd while other words are copied directly to the corresponding words. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
GPR[rt]95..64 GPR[rt]63..32 GPR[rt]31..0 GPR[rt]127..96
96 95 64 63 32 31 0
127
rt
A3
A2
A1
A0
127
96 95
64 63
32 31
0
rd
A3
A0
A1
A2
Exceptions:
None
B-75
Appendix B C790-Specific Instruction Set Details
PEXT5
31 26 25 21 20
Parallel Extend from 5-bits 16 15 11 10 65
PEXT5
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PEXT5 11110
5
MMI0 001000
6
C790
Format: Purpose: Description:
PEXT5 rd, rt To extend bytes from 5-bits. rd extend (rt)
The four low-order 16-bits (1, 5, 5, 5 bit) of the four words in GPR rt are extended to four 32-bits (8, 8, 8, 8 bit). The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation
GPR[rd]2..0 GPR[rd]7..3 GPR[rd]10..8 GPR[rd]15..11 GPR[rd]18..16 GPR[rd]23..19 GPR[rd]30..24 GPR[rd]31 GPR[rd]34..32 GPR[rd]39..35 GPR[rd]42..40 GPR[rd]47..43 GPR[rd]50..48 GPR[rd]55..51 GPR[rd]62..56 GPR[rd]63 GPR[rd]66..64 GPR[rd]71..67 GPR[rd]74..72 GPR[rd]79..75 GPR[rd]82..80 GPR[rd]87..83 GPR[rd]94..88 GPR[rd]95 GPR[rd]98..96 GPR[rd]103..99 GPR[rd]106..104 GPR[rd]111..107 GPR[rd]114..112 GPR[rd]119..115 GPR[rd]126..120 GPR[rd]127
03 GPR[rt]4..0 03 GPR[rt]9..5 03 GPR[rt]14..10 07 GPR[rt]15 03 GPR[rt]36..32 03 GPR[rt]41..37 03 GPR[rt]46..42 07 GPR[rt]47 03 GPR[rt]68..64 03 GPR[rt]73..69 03 GPR[rt]78..74 07 GPR[rt]79 03 GPR[rt]100..96 03 GPR[rt]105..101 03 GPR[rt]110..106 07 GPR[rt]111
B-76
Appendix B C790-Specific Instruction Set Details
[Overview]
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rt
127
96 95
64 63
32 31
0
rd
[Detail of word region (31..0)]
31 16 15 14 10 9
Zoom
54 0
rt
A3 1bit
31 30 24 23 19 18 16 15
A2 5bit
11 10
A1 5bit
87
A0 5bit
32 0
rd
A3
0
7
A2 8bit
0
3
A1 8bit
0
3
A0 8bit
0
3
8bit
Exceptions:
None
B-77
Appendix B C790-Specific Instruction Set Details
PEXTLB
31 26 25 21 20
Parallel Extend Lower from Byte 16 15 11 10 65
PEXTLB
0
MMI 011100
6
rs
5
rt
5
rd
5
PEXTLB 11010
5
MMI0 001000
6
C790
Format: Purpose: Description:
PEXTLB rd, rs, rt To extend halfwords from bytes. rd extend (rs, rt)
The contents of the low-order doubleword in GPR rs are combined with the contents of the low-order doubleword in GPR rt in a byte wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation
GPR[rd]7..0 GPR[rd]15..8 GPR[rd]23..16 GPR[rd]31..24 GPR[rd]39..32 GPR[rd]47..40 GPR[rd]55..48 GPR[rd]63..56 GPR[rd]71..64 GPR[rd]79..72 GPR[rd]87..80 GPR[rd]95..88 GPR[rd]103..96 GPR[rd]111..104 GPR[rd]119..112 GPR[rd]127..120
GPR[rt]7..0 GPR[rs]7..0 GPR[rt]15..8 GPR[rs]15..8 GPR[rt]23..16 GPR[rs]23..16 GPR[rt]31..24 GPR[rs]31..24 GPR[rt]39..32 GPR[rs]39..32 GPR[rt]47..40 GPR[rs]47..40 GPR[rt]55..48 GPR[rs]55..48 GPR[rt]63..56 GPR[rs]63..56
127
64 63 56 55 48 47
40 39
32 31
24 23
16 15
87
0
rs
A7
A6
A5
A4
A3
A2
A1
A0
127
120 119 112 111 104 103 96 95
88 87 80 79
72 71
64 63 56 55 48 47
40 39
32 31
24 23
16 15
87
0
rd
A7
B7
A6
B6
A5
B5
A4
B4
A3
B3
A2
B2
A1
B1
A0
B0
127
64 63 56 55 48 47
40 39
32 31
24 23
16 15
87
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
Exceptions:
None
B-78
Appendix B C790-Specific Instruction Set Details
PEXTLH
31 26 25
Parallel Extend Lower from Halfword 21 20 16 15 11 10 65
PEXTLH
0
MMI 011100
6
rs
5
rt
5
rd
5
PEXTLH 10110
5
MMI0 001000
6
C790
Format: Purpose: Description:
PEXTLH rd, rs, rt To extend words from halfwords. rd extend (rs, rt)
The contents of the low-order doubleword in GPR rs are combined with the contents of the low-order doubleword in GPR rt in a halfword wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
GPR[rt]15..0 GPR[rs]15.. 0 GPR[rt]31..16 GPR[rs]31..16 GPR[rt]47..32 GPR[rs]47..32 GPR[rt]63..48 GPR[rs]63..48
64 63 48 47 32 31 16 15 0
127
rs
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
127
A3
B3
A2
B2
64 63
A1
48 47
B1
32 31
A0
16 15
B0
0
rt
B3
B2
B1
B0
Exceptions:
None
B-79
Appendix B C790-Specific Instruction Set Details
PEXTLW
31 26 25
Parallel Extend Lower from Word 21 20 16 15 11 10 65
PEXTLW
0
MMI 011100
6
rs
5
rt
5
rd
5
PEXTLW 10010
5
MMI0 001000
6
C790
Format: Purpose: Description:
PEXTLW rd, rs, rt To extend doublewords from words. rd extend (rs, rt)
The contents of the low-order doubleword in GPR rs are combined with the contents of the low-order doubleword in GPR rt in a word wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
GPR[rt]31..0 GPR[rs]31..0 GPR[rt]63..32 GPR[rs]63..32
64 63 32 31 0
rs
A1
A0
127
96 95
64 63
32 31
0
rd
127
A1
B1
64 63
A0
32 31
B0
0
rt
B1
B0
Exceptions:
None
B-80
Appendix B C790-Specific Instruction Set Details
PEXTUB
31 26 25 21 20
Parallel Extend Upper from Byte 16 15 11 10 65
PEXTUB
0
MMI 011100
6
rs
5
rt
5
rd
5
PEXTUB 11010
5
MMI1 101000
6
C790
Format: Purpose: Description:
PEXTUB rd, rs, rt To extend halfwords from bytes. rd extend (rs, rt)
The contents of the high-order doubleword in GPR rs are combined with the contents of the high-order doubleword in GPR rt in a byte wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]7..0 GPR[rd]15..8 GPR[rd]23..16 GPR[rd]31..24 GPR[rd]39..32 GPR[rd]47..40 GPR[rd]55..48 GPR[rd]63..56 GPR[rd]71..64 GPR[rd]79..72 GPR[rd]87..80 GPR[rd]95..88 GPR[rd]103..96 GPR[rd]111..104 GPR[rd]119..112 GPR[rd]127..120
GPR[rt]71..64 GPR[rs]71..64 GPR[rt]79..72 GPR[rs]79..72 GPR[rt]87..80 GPR[rs]87..80 GPR[rt]95..88 GPR[rs]95..88 GPR[rt]103..96 GPR[rs]103..96 GPR[rt]111..104 GPR[rs]111..104 GPR[rt]119..112 GPR[rs]119..112 GPR[rt]127..120 GPR[rs]127..120
88 87 80 79 72 71 64 63 0
127 120 119 112 111 104 103 96 95
rs
A7
A6
A5
A4
A3
A2
A1
A0
127
120 119 112 111 104 103 96 95
88 87 80 79
72 71
64 63 56 55 48 47
40 39
32 31
24 23
16 15
87
0
rd
A7
B7
A6
B6
A5
B5
A4
B4
A3
B3
A2
B2
A1
B1
A0
B0
127
120 119 112 111 104 103 96 95
88 87 80 79
72 71
64 63
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
Exceptions:
None B-81
Appendix B C790-Specific Instruction Set Details
PEXTUH
31 26 25
Parallel Extend Upper from Halfword 21 20 16 15 11 10 65
PEXTUH
0
MMI 011100
6
rs
5
rt
5
rd
5
PEXTUH 10110
5
MMI1 101000
6
C790
Format: Purpose: Description:
PEXTUH rd, rs, rt To extend words from halfwords. rd extend (rs, rt)
The contents of the high-order doubleword in GPR rs are combined with the contents of the high-order doubleword in GPR rt in a halfword wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
GPR[rt]79..64 GPR[rs]79..64 GPR[rt]95..80 GPR[rs]95..80 GPR[rt]111..96 GPR[rs]111..96 GPR[rt]127..112 GPR[rs]127..112
96 95 80 79 64 63 0
rs
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
127
A3
112 111
B3
96 95
A2
80 79
B2
64 63
A1
B1
A0
B0
0
rt
B3
B2
B1
B0
Exceptions:
None
B-82
Appendix B C790-Specific Instruction Set Details
PEXTUW
31 26 25
Parallel Extend Upper from Word 21 20 16 15 11 10 65
PEXTUW
0
MMI 011100
6
rs
5
rt
5
rd
5
PEXTUW 10010
5
MMI1 101000
6
C790
Format: Purpose: Description:
PEXTUW rd, rs, rt To extend doublewords from words. rd extend (rs, rt)
The contents of the high-order doubleword in GPR rs are combined with the contents of the high-order doubleword in GPR rt in a word wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
GPR[rt]95..64 GPR[rs]95..64 GPR[rt]127..96 GPR[rs]127..96
96 95 64 63 0
127
rs
A1
A0
127
96 95
64 63
32 31
0
rd
127
A1
96 95
B1
64 63
A0
B0
0
rt
B1
B0
Exceptions:
None
B-83
Appendix B C790-Specific Instruction Set Details
PHMADH
31 26 25
Parallel Horizontal Multiply-Add Halfword 21 20 16 15 11 10 65
PHMADH
0
MMI 011100
6
rs
5 5
rt
rd
5
PHMADH 10001
5
MMI2 001001
6 C790
Format: Purpose: Description:
PHMADH rd, rs, rt To multiply 8 pairs of 16-bit signed integers and horizontally add. (rd, HI, LO) rs x rt + rs x rt
The eight signed halfwords in GPR rs are multiplied by the eight signed halfwords in GPR rt in parallel. The four word multiply results are added to the other four word multiply results, and the four word results are placed into the corresponding words in special registers HI, LO and GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
prod0 prod1 prod2 prod3 LO 31..0 LO 63..32 HI 31..0 HI 63..32 LO 95..64 LO 127..96 HI 95..64 HI 127..96 GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
GPR[rs]31..16 x GPR[rt]31..16 + GPR[rs]15..0 x GPR[rt]15..0 GPR[rs]63..48 x GPR[rt]63..48 + GPR[rs]47..32 x GPR[rt]47..32 GPR[rs]95..80 x GPR[rt]95..80 + GPR[rs]79..64 x GPR[rt]79..64 GPR[rs]127..112 x GPR[rt]127..112 + GPR[rs]111..96 x GPR[rt]111..96 prod031..0 Undefined prod131..0 Undefined prod231..0 Undefined prod331..0 Undefined prod031..0 prod131..0 prod231..0 prod331..0
B-84
Appendix B C790-Specific Instruction Set Details
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 x B7
112 111
A6 x B6
96 95
A5 x B5
80 79
A4 x B4
64 63
A3 x B3
48 47
A2 x B2
32 31
A1 x B1
16 15
A0 x B0
0
rt
+
127 96 95
+
64 63
+
32 31
+
0
rd
A7xB7 + A6xB6
127
A5xB5 + A4xB4
96 95
A3xB3 + A2xB2
64 63
A1xB1 + A0xB0
32 31 0
HI
127
Undefined Undefined
A7xB7 + A6xB6
96 95 64 63
Undefined Undefined
A3xB3 + A2xB2
32 31 0
LO
A5xB5 + A4xB4
A1xB1 + A0xB0
Exceptions:
None
Programming Notes:
In the C790, the integer multiply operation allows other CPU instructions to execute outof-order. An attempt to read LO or HI registers before the results are written will cause an interlock until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel. Programs that require overflow detection must check for it explicitly.
B-85
Appendix B C790-Specific Instruction Set Details
PHMSBH
31 26 25
Parallel Horizontal Multiply-Subtract Halfword 21 20 16 15 11 10 65
PHMSBH
0
MMI 011100
6
rs
5 5
rt
rd
5
PHMSBH 10101
5
MMI2 001001
6 C790
Format: Purpose: Description:
PHMSBH rd, rs, rt To multiply 8 pairs of 16-bit signed integers and horizontally subtract. (rd, HI, LO) rs x rt - rs x rt
The eight signed halfwords in GPR rs are multiplied by the eight signed halfwords in GPR rt in parallel. The four word multiply results are subtracted from the other four word multiply results, and the four word results are placed into the corresponding words in special registers HI, LO and GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
prod0 prod1 prod2 prod3 LO 31..0 LO 63..32 HI 31..0 HI 63..32 LO 95..64 LO 127..96 HI 95..64 HI 127..96 GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
GPR[rs]31..16 x GPR[rt]31..16 - GPR[rs]15..0 x GPR[rt]15..0 GPR[rs]63..48 x GPR[rt]63..48 - GPR[rs]47..32 x GPR[rt]47..32 GPR[rs]95..80 x GPR[rt]95..80 - GPR[rs]79..64 x GPR[rt]79..64 GPR[rs]127..112 x GPR[rt]127..112 - GPR[rs]111..96 x GPR[rt]111..96 prod031..0 Undefined prod131..0 Undefined prod231..0 Undefined prod331..0 Undefined prod031..0 prod131..0 prod231..0 prod331..0
B-86
Appendix B C790-Specific Instruction Set Details
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 x B7
112 111
A6 x B6
96 95
A5 x B5
80 79
A4 x B4
64 63
A3 x B3
48 47
A2 x B2
32 31
A1 x B1
16 15
A0 x B0
0
rt
-
127 96 95
-
64 63
-
32 31
-
0
rd
A7xB7 - A6xB6
127
A5xB5 - A4xB4
96 95
A3xB3 - A2xB2
64 63
A1xB1 - A0xB0
32 31 0
HI
127
Undefined Undefined
A7xB7 - A6xB6
96 95 64 63
Undefined Undefined
A3xB3 - A2xB2
32 31 0
LO
A5xB5 - A4xB4
A1xB1 - A0xB0
Exceptions:
None
Programming Notes:
In the C790, the integer multiply operation allows other CPU instructions to execute outof-order. An attempt to read LO or HI registers before the results are written will wait (interlock) until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel. Programs that require overflow detection must check for it explicitly.
B-87
Appendix B C790-Specific Instruction Set Details
PINTEH
31 26 25 21 20
Parallel Interleave Even Halfword 16 15 11 10 65
PINTEH
0
MMI 011100
6
rs
5 5
rt
rd
5
PINTEH 01010
5
MMI3 101001
6 C790
Format: Purpose: Description:
PINTEH rd, rs, rt To combine halfwords in a halfword wide interleaved operation. rd interleave (rs, rt)
The low-order halfword of the four words in GPR rs are combined with the low-order halfword of the four words in GPR rt in a halfword wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
GPR[rt]15..0 GPR[rs]15..0 GPR[rt]47..32 GPR[rs]47..32 GPR[rt]79..64 GPR[rs]79..64 GPR[rt]111..96 GPR[rs]111..96
96 95 80 79 64 63 48 47 32 31 16 15 0
rs
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
127
A3
112 111
B3
96 95
A2
80 79
B2
64 63
A1
48 47
B1
32 31
A0
16 15
B0
0
rt
B3
B2
B1
B0
Exceptions:
None
B-88
Appendix B C790-Specific Instruction Set Details
PINTH
31 26 25 21 20
Parallel Interleave Halfword 16 15 11 10 65
PINTH
0
MMI 011100
6
rs
5
rt
5
rd
5
PINTH 01010
5
MMI2 001001
6
C790
Format: Purpose: Description:
PINTH rd, rs, rt To combine doublewords in a halfword wide interleaved operation. rd interleave (rs, rt)
The contents of the high-order doubleword in GPR rs are combined with the contents of the low-order doubleword in GPR rt in a halfword wide Interleaved operation. The quadword result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
GPR[rt]15..0 GPR[rs]79..64 GPR[rt]31..16 GPR[rs]95..80 GPR[rt]47..32 GPR[rs]111..96 GPR[rt]63..48 GPR[rs]127..112
127
112 111
96 95
80 79
64 63
0
rs
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
127
A3
B3
A2
B2
64 63
A1
48 47
B1
32 31
A0
16 15
B0
0
rt
B3
B2
B1
B0
Exceptions:
None
B-89
Appendix B C790-Specific Instruction Set Details
PLZCW
31 26 25
Parallel Leading Zero or one Count Word 21 20 16 15 11 10 65
PLZCW
0
MMI 011100
6
rs
5
0 00000
5
rd
5
0 00000
5
PLZCW 000100
6
C790
Format: Purpose: Description:
PLZCW rd, rs To count leading zero (s) or one (s) (2 parallel operations). rd LZC (rs) - 1
The number of leading zeros or ones of the two words in GPR rs are counted. The results of the leading counts minus one are loaded in the corresponding words in GPR rd.
Operation:
GPR[rd]31..0 Leading zero or one count (GPR[rs]31..0) - 1 GPR[rd]63..32 Leading zero or one count (GPR[rs]63..32) - 1
63 32 31 0
rs
A1
A0
Leading zero or one Count
63 32 31 0
rd
LZC(A1) - 1
LZC(A0) - 1
Example :
63 32 31 0
rs
0x000FFFFF Leading zero Count
63
0xFF000000 Leading one Count
0
32 31
rd
0x0000000B
0x00000007
Exceptions:
None
B-90
Appendix B C790-Specific Instruction Set Details
PMADDH
31 26 25 21 20
Parallel Multiply-Add Halfword 16 15 11 10 65
PMADDH
0
MMI 011100
6
rs
5
rt
5
rd
5
PMADDH 10000
5
MMI2 001001
6
C790
Format: Purpose: Description:
PMADDH rd, rs, rt To multiply 8 pairs of 16-bit signed integers and accumulate, in parallel. (rd, HI, LO) (HI, LO) + rs x rt
The eight signed halfwords in GPR rs are multiplied by the eight signed halfwords in GPR rt in parallel. The eight word multiply results are added to the corresponding words in special registers HI and LO, and the word results are placed into the corresponding words in special registers HI, LO and GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
prod0 prod1 prod2 prod3 prod4 prod5 prod6 prod7 LO 31..0 LO 63..32 HI 31..0 HI 63..32 LO 95..64 LO 127..96 HI 95..64 HI 127..96 GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
LO 31..0 + GPR[rs]15..0 x GPR[rt]15..0 LO 63..32 + GPR[rs]31..16 x GPR[rt]31..16 HI 31..0 + GPR[rs]47..32 x GPR[rt]47..32 HI 63..32 + GPR[rs]63..48 x GPR[rt]63..48 LO 95..64 + GPR[rs]79..64 x GPR[rt]79..64 LO 127..96 + GPR[rs]95..80 x GPR[rt]95..80 HI 95..64 + GPR[rs]111..96 x GPR[rt]111..96 HI 127..96 + GPR[rs]127..112 x GPR[rt]127..112 prod031..0 prod131..0 prod231..0 prod331..0 prod431..0 prod531..0 prod631..0 prod731..0 prod031..0 prod231..0 prod431..0 prod631..0
B-91
Appendix B C790-Specific Instruction Set Details
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 x
112 111
A6 x B6
96 95 96 95
A5 x B5
80 79
A4 x B4
64 63 64 63
A3 x B3
48 47
A2 x B2
32 31 32 31
A1 x
16 15
A0 x B0
0 0
rt
127
B7
B1
HI
127
C7
96 95
C6
64 63
C3
32 31
C2
0
LO
C5
C4
C1
C0
127
96 95
64 63
32 31
0
rd
127
A6 x B6 + C6
96 95
A4 x B4 + C4
64 63
A2 x B2 + C2
32 31
A0 x B0 + C0
0
HI
127
A7 x B7 + C7
96 95
A6 x B6 + C6
64 63
A3 x B3 + C3
32 31
A2 x B2 + C2
0
LO
A5 x B5 + C5
A4 x B4 + C4
A1 x B1 + C1
A0 x B0 + C0
Exceptions:
None
Programming Notes:
In the C790, the integer multiply operation allow other CPU instructions to execute outof-order. An attempt to read LO or HI registers before the results are written will cause an interlock until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel. Programs that require overflow detection must check for it explicitly.
B-92
Appendix B C790-Specific Instruction Set Details
PMADDUW
31 26 25
Parallel Multiply-Add Unsigned Word 21 20 16 15 11 10 65
PMADDUW
0
MMI 011100
6
rs
5
rt
5
rd
5
PMADDUW
00000
5
MMI3 101001
6
C790
Format: Purpose: Description:
PMADDUW rd, rs, rt To multiply 2 pairs of 32-bit unsigned integers and accumulate in parallel. (rd, HI, LO) (HI, LO) + rs x rt
The low-order unsigned word of the two doublewords in GPR rs are multiplied by the loworder unsigned word of the two doublewords in GPR rt in parallel. The two 64-bit multiply results are added to the contents of special registers HI and LO. The low-order word of the two doubleword results are placed into special register LO, and the high-order word of the two doubleword results are placed into special register HI. The two doubleword results are placed into GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If either GPR rt or GPR rs do not contain zero-extended 32-bit values (bits 127..96 and 63..32 equal zero) then the result of the equation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (HI31..0 || LO31..0) + (0 || GPR[rs]31..0) x (0 || GPR[rt]31..0) prod0 (HI95..64 || LO95..64) + (0 || GPR[rs]95..64) x (0 || GPR[rt]95..64) prod1 (prod0 31)32 || prod031..0 LO63..0 (prod0 63)32 || prod063..32 HI63..0 (prod1 31)32 || prod131..0 LO127..64 (prod1 63)32 || prod163..32 HI127..64 GPR[rd]63..0 prod063..0 GPR[rd]127..64 prod163..0
B-93
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
96 95
A2 x B2
96 95 64 63 64 63
A1
32 31
A0 x B0
32 31 0 0
rt
127
B3
B1
HI
127
C7
96 95
C6
64 63
C3
32 31
C2
0
LO
C5
C4
C1
C0
127
64 63
0
rd
127
(0 || A2) x (0 || B2) + (C6 || C4)
96 95 64 63
(0 || A0) x (0 || B0) + (C2 || C0)
32 31 0
HI
127
sign ext
((0 || A2) x (0 || B2) + (C6 || C4))63..32
sign ext
((0 || A0) x (0 || B0) + (C2 || C0))63..32
96 95
64 63
32 31
0
LO
sign ext
((0 || A2) x (0 || B2) + (C6 || C4))31..0
sign ext
((0 || A0) x (0 || B0) + (C2 || C0))31..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the PMADDH instruction.
B-94
Appendix B C790-Specific Instruction Set Details
PMADDW
31 26 25 21 20
Parallel Multiply-Add Word 16 15 11 10 65
PMADDW
0
MMI 011100
6
rs
5
rt
5
rd
5
PMADDW 00000
5
MMI2 001001
6
C790
Format: Purpose: Description:
PMADDW rd, rs, rt To multiply 2 pairs of 32-bit signed integers and accumulate in parallel. (rd, HI, LO) (HI, LO) + rs x rt
The low-order signed word of the two doublewords in GPR rs are multiplied by the loworder signed word of the two doublewords in GPR rt in parallel. The two 64-bit multiply results are added to the contents of special registers HI and LO. The low-order word of the two doubleword results are placed into special register LO, and the high-order word of the two doubleword results are placed into special register HI. The two doubleword results are placed into GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 127..95 and 63..31 equal) then the result of the equation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (HI31..0 || LO31..0) + GPR[rs]31..0 x GPR[rt]31..0 prod0 (HI95..64 || LO95..64) + GPR[rs]95..64 x GPR[rt]95..64 prod1 (prod0 31)32 || prod031..0 LO63..0 (prod0 63)32 || prod063..32 HI63..0 (prod1 31)32 || prod131..0 LO127..64 (prod1 63)32 || prod163..32 HI127..64 GPR[rd]63..0 prod063..0 GPR[rd]127..64 prod163..0
B-95
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
96 95
A2 x B2
96 95 64 63 64 63
A1
32 31
A0 x B0
32 31 0 0
rt
127
B3
B1
HI
127
C7
96 95
C6
64 63
C3
32 31
C2
0
LO
C5
C4
C1
C0
127
64 63
0
rd
127
A2 x B2 + (C6 || C4)
96 95 64 63
A0 x B0 + (C2 || C0)
32 31 0
HI
127
sign ext
(A2 x B2 + (C6 || C4))63..32 96 95 64 63
sign ext
(A0 x B0 + (C2 || C0))63..32 32 31 0
LO
sign ext
(A2 x B2 + (C6 || C4))31..0
sign ext
(A0 x B0 + (C2 || C0))31..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the PMADDH instruction.
B-96
Appendix B C790-Specific Instruction Set Details
PMAXH
31 26 25 21 20
Parallel Maximum Halfword 16 15 11 10 65
PMAXH
0
MMI 011100
6
rs
5 5
rt
rd
5
PMAXH 00111
5
MMI0 001000
6 C790
Format: Purpose: Description:
PMAXH rd, rs, rt To select maximum 16-bit signed integers (8 parallel operations). rd max (rs, rt)
The eight signed halfword values in GPR rt are subtracted from the corresponding eight signed halfword values in GPR rs in parallel. If the result of subtraction is larger than zero, the corresponding signed halfword value in GPR rs is placed into the corresponding halfword in GPR rd otherwise the corresponding signed halfword value in GPR rt is placed into the corresponding halfword of the GPR rd. This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]15..0 - GPR[rt]15..0) > 0) then GPR[rd]15..0 GPR[rs]15..0 else GPR[rd]15..0 GPR[rt]15..0 endif if ((GPR[rs]31..16 - GPR[rt]31..16) > 0) then GPR[rd]31..16 GPR[rs]31..16 else GPR[rd]31..16 GPR[rt]31..16 endif if ((GPR[rs]47..32 - GPR[rt]47..32) > 0) then GPR[rd]47..32 GPR[rs]47..32 else GPR[rd]47..32 GPR[rt]47..32 endif if ((GPR[rs]63..48 - GPR[rt]63..48) > 0) then GPR[rd]63..48 GPR[rs]63..48 else GPR[rd]63..48 GPR[rt]63..48 endif if ((GPR[rs]79..64 - GPR[rt]79..64) > 0) then GPR[rd]79..64 GPR[rs]79..64 else GPR[rd]79..64 GPR[rt]79..64 endif
B-97
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]95..80 - GPR[rt]95..80) > 0) then GPR[rd]95..80 GPR[rs]95..80 else GPR[rd]95..80 GPR[rt]95..80 endif if ((GPR[rs]111..96 - GPR[rt]111..96) > 0) then GPR[rd]111..96 GPR[rs]111..96 else GPR[rd]111..96 GPR[rt]111..96 endif if ((GPR[rs]127..112 - GPR[rt]127..112) > 0) then GPR[rd]127..112 GPR[rs]127..112 else GPR[rd]127..112 GPR[rt]127..112 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7
A6
112 111 96 95
A5
80 79
A4
64 63
A3
48 47
A2
32 31
A1
16 15
A0
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
max (A7, B7)
max (A6, B6)
max (A5, B5)
max (A4, B4)
max (A3, B3)
max (A2, B2)
max (A1, B1)
max (A0, B0)
Exceptions:
None
B-98
Appendix B C790-Specific Instruction Set Details
PMAXW
31 26 25 21 20
Parallel Maximum Word 16 15 11 10 65
PMAXW
0
MMI 011100
6
rs
5
rt
5
rd
5
PMAXW 00011
5
MMI0 001000
6 C790
Format: Purpose: Description:
PMAXW rd, rs, rt To select maximum 32-bit signed integers (4 parallel operations). rd max (rs, rt)
The four signed word values in GPR rt are subtracted from the corresponding four signed word values in GPR rs in parallel. If the result of subtraction is larger than zero, the corresponding signed word value in GPR rs is placed into the corresponding word in GPR rd otherwise the corresponding signed word value in GPR rt is placed into the corresponding word of the GPR rd. This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]31..0 - GPR[rt]31..0) > 0) then GPR[rd]31..0 GPR[rs]31..0 else GPR[rd]31..0 GPR[rt]31..0 endif if ((GPR[rs]63..32 - GPR[rt]63..32) > 0) then GPR[rd]63..32 GPR[rs]63..32 else GPR[rd]63..32 GPR[rt]63..32 endif if ((GPR[rs]95..64 - GPR[rt]95..64) > 0) then GPR[rd]95..64 GPR[rs]95..64 else GPR[rd]95..64 GPR[rt]95..64 endif if ((GPR[rs]127..96 - GPR[rt]127..96) > 0) then GPR[rd]127..96 GPR[rs]127..96 else GPR[rd]127..96 GPR[rt]127..96 endif
B-99
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
96 95
A2
64 63
A1
32 31
A0
0
rt
B3
B2
B1
B0
127
96 95
64 63
32 31
0
rd
max (A3, B3)
max (A2, B2)
max (A1, B1)
max (A0, B0)
Exceptions:
None
B-100
Appendix B C790-Specific Instruction Set Details
PMFHI
31 26 25
Parallel Move From HI Register 16 15 11 10 65
PMFHI
0
MMI 011100
6
0 0000000000
10
rd
5
PMFHI 01000
5
MMI2 001001
6
C790
Format: Purpose: Description:
PMFHI rd To copy the special purpose register HI to a GPR. rd HI
The contents of special register HI are loaded into GPR rd. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
GPR[rd]127..0 HI127..0
127 64 63 0
HI
A1
A0
127
64 63
0
rd
A1
A0
Exceptions:
None
B-101
Appendix B C790-Specific Instruction Set Details
PMFHL.fmt
31 26 25
Parallel Move From HI / LO Register 16 15 11 10 65
PMFHL.fmt
0
MMI 011100
6
0 0000000000
10
rd
5
fmt
5
PMFHL 110000
6
C790
Format:
PMFHL.LW rd (fmt = 0) PMFHL.UW rd (fmt = 1) PMFHL.SLW rd (fmt = 2) PMFHL.LH rd (fmt = 3) PMFHL.SH rd (fmt = 4) To copy the special purpose registers HI / LO to a GPR. rd HI / LO
Purpose: Description:
The contents of special registers HI / LO are loaded into GPR rd. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
if (fmt = 0) then GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
LO31..0 HI31..0 LO95..64 HI95..64
else if (fmt = 1) then GPR[rd]31..0 LO63..32 GPR[rd]63..32 HI63..32 GPR[rd]95..64 LO127..96 GPR[rd]127..96 HI127..96 else if (fmt = 2) then if (0x7FFFFFFFFFFFFFFF > = (HI31..0 || LO31..0) > 0x000000007FFFFFFF) then GPR[rd]63..0 0x000000007FFFFFFF else if (0x8000000000000000 < = (HI31..0 || LO31..0) < -0x0000000080000000) then GPR[rd]63..0 0xFFFFFFFF80000000 else GPR[rd]63..0 HI31..0 || LO31..0 endif if ((HI95..64 || LO95..64) > 0x000000007FFFFFFF) then GPR[rd]127.. 64 0x000000007FFFFFFF else if ((HI95..64 || LO95..64) < -0x0000000080000000) then GPR[rd]127.. 64 -0x0000000080000000 else GPR[rd]127.. 64 (LO95)32 || LO95..64 endif else if (fmt = 3) then GPR[rd]15..0 LO15..0
B-102
Appendix B C790-Specific Instruction Set Details
GPR[rd]31..16 LO47..32 GPR[rd]47..32 HI15..0 GPR[rd]63..48 HI47..32 GPR[rd]79..64 LO79..64 GPR[rd]95..80 LO111..96 GPR[rd]111..96 HI79..64 GPR[rd]127..112 HI111..96 else if (fmt = 4) then if (0x7FFFFFF> = LO31..0 > 0x00007FFF) then GPR[rd]15..0 0x7FFF else if (0x80000000< = LO31..0 < 0xFFFF8000) then GPR[rd]15..0 0x8000 else GPR[rd]15..0 LO15..0 endif if (LO63..32 > 0x00007FFF) then GPR[rd]31..16 0x7FFF else if (LO63..32 < 0xFFFF8000) then GPR[rd]31..16 0x8000 else GPR[rd]31..16 LO47..32 endif if (HI31..0 > 0x00007FFF) then GPR[rd]47..32 0x7FFF else if (HI31..0 < 0xFFFF8000) then GPR[rd]47..32 0x8000 else GPR[rd]47..32 HI15..0 endif if (HI63..32 > 0x00007FFF) then GPR[rd]63..48 0x7FFF else if (HI63..32 < 0xFFFF8000) then GPR[rd]63..48 0x8000 else GPR[rd]63..48 HI47..32 endif if (LO95..64 > 0x00007FFF) then GPR[rd]79..64 0x7FFF else if (LO95..64 < -0xFFFF8000) then GPR[rd]79..64 0x8000 else GPR[rd]79..64 LO79..64 endif if (LO127..96 > 0x00007FFF) then GPR[rd]95..80 0x7FFF else if (LO127..96 < 0xFFFF8000) then GPR[rd]95..80 0x8000 else GPR[rd]95..80 LO111..96 endif if (HI95..64 > 0x00007FFF) then GPR[rd]111..96 0x7FFF else if (HI95..64 < 0xFFFF8000) then GPR[rd]111..96 0x8000
B-103
Appendix B C790-Specific Instruction Set Details
else GPR[rd]111..96 HI79..64 endif if (HI127..96 > 0x00007FFF) then GPR[rd]127..112 0x7FFF else if (HI127..96 < 0xFFFF8000) then GPR[rd]127..112 0x8000 else GPR[rd]127..112 HI111..96 endif endif
(fmt = 0)
127 96 95 64 63 32 31 0
HI
127 96 95
A1
64 63 32 31
A0
0
rd
127
A1
96 95
B1
64 63
A0
32 31
B0
0
LO
(fmt = 1)
127 96 95
B1
B0
64 63
32 31
0
HI
127
A1
96 95 64 63
A0
32 31 0
rd
127
A1
96 95
B1
64 63
A0
32 31
B0
0
LO
B1
B0
(fmt = 2)
127 96 95 64 63 32 31 0
HI
A1 Saturate to Signed Word
127 96 95 64 63 32 31
A0
0
rd
127
sign ext
saturate(A1 B1)
96 95 64 63
sign ext Saturate to Signed Word
saturate(A0 B0)
32 31 0
LO
B1
B0
B-104
Appendix B C790-Specific Instruction Set Details
(fmt = 3)
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
HI
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A3
A2
B3
B2
A1
A0
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
LO
B3
B2
B1
B0
(fmt = 4)
127 96 95 64 63 32 31 0
HI
A3
A2
A1
A0
Saturate to signed Halfword
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rd
A3
A2
B3
B2
A1
A0
B1
B0
Saturate to signed Halfword
127 96 95 64 63 32 31 0
LO
B3
B2
B1
B0
Exceptions:
None
B-105
Appendix B C790-Specific Instruction Set Details
PMFLO
31 26 25
Parallel Move From LO Register 16 15 11 10 65
PMFLO
0
MMI 011100
6
0 0000000000
10
rd
5
PMFLO 01001
5
MMI2 001001
6
C790
Format: Purpose: Description:
PMFLO rd To copy the special purpose register LO to a GPR. rd LO
The contents of special register LO are loaded into GPR rd. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
GPR[rd]127..0 LO127..0
127 64 63 0
LO
A1
A0
127
64 63
0
rd
A1
A0
Exceptions:
None
B-106
Appendix B C790-Specific Instruction Set Details
PMINH
31 26 25 21 20
Parallel Minimum Halfword 16 15 11 10 65
PMINH
0
MMI 011100
6
rs
5 5
rt
rd
5
PMINH 00111
5
MMI1 101000
6 C790
Format: Purpose: Description:
PMINH rd, rs, rt To select the minimum of two 16-bit signed integers (8 parallel operations). rd min (rs, rt)
The eight signed halfword values in GPR rt are subtracted from the corresponding eight signed halfword values in GPR rs in parallel. If the result of each subtraction is larger than zero, the corresponding signed halfword in GPR rt is placed into the corresponding halfword in GPR rd otherwise the corresponding signed halfword in GPR rs is placed into the corresponding halfword of GPR rd. This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]15..0 - GPR[rt]15..0) > 0) then GPR[rd]15..0 GPR[rt]15..0 else GPR[rd]15..0 GPR[rs]15..0 endif if ((GPR[rs]31..16 - GPR[rt]31..16) > 0) then GPR[rd]31..16 GPR[rt]31..16 else GPR[rd]31..16 GPR[rs]31..16 endif if ((GPR[rs]47..32 - GPR[rt]47..32) > 0) then GPR[rd]47..32 GPR[rt]47..32 else GPR[rd]47..32 GPR[rs]47..32 endif if ((GPR[rs]63..48 - GPR[rt]63..48) > 0) then GPR[rd]63..48 GPR[rt]63..48 else GPR[rd]63..48 GPR[rs]63..48 endif if ((GPR[rs]79..64 - GPR[rt]79..64) > 0) then GPR[rd]79..64 GPR[rt]79..64 else GPR[rd]79..64 GPR[rs]79..64 endif if ((GPR[rs]95..80 - GPR[rt]95..80) > 0) then GPR[rd]95..80 GPR[rt]95..80 else GPR[rd]95..80 GPR[rs]95..80 endif
B-107
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]111..96 - GPR[rt]111..96) > 0) then GPR[rd]111..96 GPR[rt]111..96 else GPR[rd]111..96 GPR[rs]111..96 endif if ((GPR[rs]127..112 - GPR[rt]127..112) > 0) then GPR[rd]127..112 GPR[rt]127..112 else GPR[rd]127..112 GPR[rs]127..112 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7
112 111
A6
96 95
A5
80 79
A4
64 63
A3
48 47
A2
32 31
A1
16 15
A0
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
min (A0, B0)
0
rd
min (A7, B7)
min (A6, B6)
min (A5, B5)
min (A4, B4)
min (A3, B3)
min (A2, B2)
min (A1, B1)
Exceptions:
None
B-108
Appendix B C790-Specific Instruction Set Details
PMINW
31 26 25 21 20
Parallel Minimum Word 16 15 11 10 65
PMINW
0
MMI 011100
6
rs
5 5
rt
rd
5
PMINW 00011
5
MMI1 101000
6 C790
Format: Purpose: Description:
PMINW rd, rs, rt To select the minimum of two 32-bit signed integers (4 parallel operations). rd min (rs, rt)
The four signed word values in GPR rt are subtracts from the corresponding four signed word values in GPR rs, in parallel. If the result of each subtraction is larger than zero, the corresponding signed word value in GPR rt is placed into the corresponding word of GPR rd otherwise the corresponding signed word value in GPR rs is placed into the corresponding word of GPR rd. This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]31..0 - GPR[rt]31..0) > 0) then GPR[rd]31..0 GPR[rt]31..0 else GPR[rd]31..0 GPR[rs]31..0 endif if ((GPR[rs]63..32 - GPR[rt]63..32) > 0) then GPR[rd]63..32 GPR[rt]63..32 else GPR[rd]63..32 GPR[rs]63..32 endif if ((GPR[rs]95..64 - GPR[rt]95..64) > 0) then GPR[rd]95..64 GPR[rt]95..64 else GPR[rd]95..64 GPR[rs]95..64 endif if ((GPR[rs]127..96 - GPR[rt]127..96) > 0) then GPR[rd]127..96 GPR[rt]127..96 else GPR[rd]127..96 GPR[rs]127..96 endif
B-109
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
96 95
A2
64 63
A1
32 31
A0
0
rt
B3
B2
B1
B0
127
96 95
64 63
32 31
0
rd
min (A3, B3)
min (A2, B2)
min (A1, B1)
min (A0, B0)
Exceptions:
None
B-110
Appendix B C790-Specific Instruction Set Details
PMSUBH
31 26 25
Parallel Multiply-Subtract Halfword 21 20 16 15 11 10 65
PMSUBH
0
MMI 011100
6
rs
5
rt
5
rd
5
PMSUBH 10100
5
MMI2 001001
6
C790
Format: Purpose: Description:
PMSUBH rd, rs, rt To multiply 8 pairs of 16-bit signed integers and subtract in parallel. (rd, HI, LO) (HI, LO) - rs x rt
The eight signed halfwords in GPR rs are multiplied by the eight signed halfwords in GPR rt in parallel. The eight word multiply results are subtracted from the corresponding words in special registers HI and LO, and the word results are placed into the corresponding words in special registers HI, LO and GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
LO 31..0 - GPR[rs]15..0 x GPR[rt]15..0 prod0 LO 63..32 - GPR[rs]31..16 x GPR[rt]31..16 prod1 HI 31..0 - GPR[rs]47..32 x GPR[rt]47..32 prod2 HI 63..32 - GPR[rs]63..48 x GPR[rt]63..48 prod3 LO 95..64 - GPR[rs]79..64 x GPR[rt]79..64 prod4 LO 127..96 - GPR[rs]95..80 x GPR[rt]95..80 prod5 HI 95..64 - GPR[rs]111..96 x GPR[rt]111..96 prod6 HI 127..96 - GPR[rs]127..112 x GPR[rt]127..112 prod7 prod031..0 LO 31..0 prod131..0 LO 63..32 prod231..0 HI 31..0 prod331..0 HI 63..32 prod431..0 LO 95..64 prod531..0 LO 127..96 prod631..0 HI 95..64 prod731..0 HI 127..96 GPR[rd] 31..0 prod031..0 GPR[rd] 63..32 prod231..0 GPR[rd] 95..64 prod431..0 GPR[rd] 127..96 prod631..0
B-111
Appendix B C790-Specific Instruction Set Details
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 x
112 111
A6 x
96 95
A5 x B5
96 95 80 79
A4 x B4
64 63 64 63
A3 x B3
48 47
A2 x B2
32 31 32 31
A1 x B1
16 15
A0 x B0
0 0
rt
127
B7
B6
HI
127
C7
96 95
C6
64 63
C3
32 31
C2
0
LO
C5
C4
C1
C0
127
96 95
64 63
32 31
0
rd
127
C6 - A6 x B6
96 95
C4 - A4 x B4
64 63
C2 - A2 x B2
32 31
C0 - A0 x B0
0
HI
127
C7 - A7 x B7
96 95
C6 - A6 x B6
64 63
C3 - A3 x B3
32 31
C2 - A2 x B2
0
LO
C5 - A5 x B5
C4 - A4 x B4
C1 - A1 x B1
C0 - A0 x B0
Exceptions:
None
Programming Notes:
See the Programming Notes for the PMADDH instruction.
B-112
Appendix B C790-Specific Instruction Set Details
PMSUBW
31 26 25 21 20
Parallel Multiply-Subtract Word 16 15 11 10 65
PMSUBW
0
MMI 011100
6
rs
5
rt
5
rd
5
PMSUBW 00100
5
MMI2 001001
6
C790
Format: Purpose: Description:
PMSUBW rd, rs, rt To multiply 2 pairs of 32-bit signed integers and subtract in parallel. (rd, HI, LO) (HI, LO) - rs x rt
The low-order signed words of the two doublewords in GPR rs are multiplied by the loworder signed words of the two doublewords in GPR rt in parallel. The two 64-bit multiply results are subtracted from the contents of special registers HI and LO. The low-order word of the two doubleword results are placed into special register LO, and the high-order word of the two doubleword results are placed into special register HI. The two doubleword results are placed into GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 127..95 and 63..31 equal) then the result of the equation will be undefined.
Operation:
if (NotWordValue(GPR[rs]) or NotWordValue(GPR[rt])) then UndefinedResult() endif (HI31..0 || LO31..0) - GPR[rs]31..0 x GPR[rt]31..0 prod0 (HI95..64 || LO95..64) - GPR[rs]95..64 x GPR[rt]95..64 prod1 (prod031)32 || prod031..0 LO63..0 (prod063)32 || prod063..32 HI63..0 (prod131)32 || prod131..0 LO127..64 (prod163)32 || prod163..32 HI127..64 GPR[rd]63..0 prod063..0 GPR[rd]127..64 prod163..0
B-113
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
96 95
A2 x B2
96 95 64 63 64 63
A1
32 31
A0 x B0
32 31 0 0
rt
127
B3
B1
HI
127
C7
96 95
C6
64 63
C3
32 31
C2
0
LO
C5
C4
C1
C0
127
64 63
0
rd
127
(C6 || C4) - A2 x B2
96 95 64 63
(C2 || C0) - A0 x B0
32 31 0
HI
127
sign ext
((C6 || C4) - A2 x B2)63..32 96 95 64 63
sign ext
((C2 || C0) - A0 x B0)63..32 32 31 0
LO
sign ext
((C6 || C4) - A2 x B2)31..0
sign ext
((C2 || C0) - A0 x B0)31..0
Exceptions:
None
Programming Notes:
See the Programming Notes for the PMADDH instruction.
B-114
Appendix B C790-Specific Instruction Set Details
PMTHI
31 26 25 21 20
Parallel Move To HI Register 11 10 65
PMTHI
0
MMI 011100
6
rs
5
0 0000000000
10
PMTHI 01000
5
MMI3 101001
6
C790
Format: Purpose: Description:
PMTHI rs To copy a GPR to the special purpose register HI. HI rs
The contents of GPR rs are loaded into special register HI. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
HI127..0
127
GPR[rs]127..0
64 63 0
rs
A1
A0
127
64 63
0
HI
A1
A0
Exceptions:
None
B-115
Appendix B C790-Specific Instruction Set Details
PMTHL.fmt
31 26 25
Parallel Move To HI / LO Register 21 20 11 10 65
PMTHL.fmt
0
MMI 011100
6
rs
5
0 0000000000
10
fmt
5
PMTHL 110001
6
C790
Format: Purpose: Description:
PMTHL.LW rs (fmt = 0) To copy a GPR to the special registers HI / LO. HI / LO rs
The contents of GPR rd are loaded into special register HI / LO. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
if (fmt = 0) then LO31..0 GPR[rs]31..0 LO63..32 LO63..32 HI31..0 GPR[rs]63..32 HI63..32 HI63..32 LO95..64 GPR[rs]95..64 LO127..96 LO127..96 HI95..64 GPR[rs]127..96 HI127..96 HI127..96 endif
127 96 95 64 63 32 31 0
HI
( not changed )
A3
( not changed )
A1
127
96 95
64 63
32 31
0
rs
A3
A2
A1
A0
127
96 95
64 63
32 31
0
LO
( not changed )
A2
( not changed )
A0
Exceptions:
None
B-116
Appendix B C790-Specific Instruction Set Details
PMTLO
31 26 25 21 20
Parallel Move To LO Register 11 10 65
PMTLO
0
MMI 011100
6
rs
5
0 0000000000
10
PMTLO 01001
5
MMI3 101001
6
C790
Format: Purpose: Description:
PMTLO rs To copy a GPR to the special register LO. LO rs
The contents of GPR rs are loaded into special register LO. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
LO127..0
127
GPR[rs]127..0
64 63 0
rs
A1
A0
127
64 63
0
LO
A1
A0
Exceptions:
None
B-117
Appendix B C790-Specific Instruction Set Details
PMULTH
31 26 25 21 20
Parallel Multiply Halfword 16 15 11 10 65
PMULTH
0
MMI 011100
6
rs
5
rt
5
rd
5
PMULTH 11100
5
MMI2 001001
6
C790
Format: Purpose: Description:
PMULTH rd, rs, rt To multiply 8 pairs of 16-bit signed integers in parallel. (rd, LO, HI) rs x rt
The eight signed halfwords in GPR rs are multiplied by the eight signed halfwords in GPR rt, in parallel. The eight word results are placed into special register HI, LO and GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
None
Operation:
prod0 prod1 prod2 prod3 prod4 prod5 prod6 prod7 LO 31..0 LO 63..32 HI 31..0 HI 63..32 LO 95..64 LO 127..96 HI 95..64 HI 127..96 GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
GPR[rs]15..0 x GPR[rt]15..0 GPR[rs]31..16 x GPR[rt]31..16 GPR[rs]47..32 x GPR[rt]47..32 GPR[rs]63..48 x GPR[rt]63..48 GPR[rs]79..64 x GPR[rt]79..64 GPR[rs]95..80 x GPR[rt]95..80 GPR[rs]111..96 x GPR[rt]111..96 GPR[rs]127..112 x GPR[rt]127..112 prod031..0 prod131..0 prod231..0 prod331..0 prod431..0 prod531..0 prod631..0 prod731..0 prod031..0 prod231..0 prod431..0 prod631..0
B-118
Appendix B C790-Specific Instruction Set Details
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
A7 x
127 112 111
A6 x
96 95
A5 x
80 79
A4 x
64 63
A3 x
48 47
A2 x
32 31
A1 x
16 15
A0 x
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
96 95
64 63
32 31
0
rd
127
A6 x B6
96 95
A4 x B4
64 63
A2 x B2
32 31
A0 x B0
0
HI
127
A7 x B7
96 95
A6 x B6
64 63
A3 x B3
32 31
A2 x B2
0
LO
A5 x B5
A4 x B4
A1 x B1
A0 x B0
Exceptions:
None
Programming Notes:
See the Programming Notes of the PMADDH instruction.
B-119
Appendix B C790-Specific Instruction Set Details
PMULTUW
31 26 25 21 20
Parallel Multiply Unsigned Word 16 15 11 10 65
PMULTUW
0
MMI 011100
6
rs
5
rt
5
rd
5
PMULTUW
01100
5
MMI3 101001
6
C790
Format: Purpose: Description:
PMULTUW rd, rs, rt To multiply 2 pairs of 32-bit unsigned integers in parallel. (rd, LO, HI) rs x rt
The low-order unsigned words of the two doublewords in GPR rs are multiplied by the low-order unsigned words of the two doublewords in GPR rt in parallel. The low-order word of the two doubleword result is placed into special register LO, and the high-order word of the two doubleword result is placed into special register HI. The two doubleword results are placed into GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If either GPR rt or GPR rs do not contain zero-extended 32-bit values (bits 127..96 and 63..32 equal zero) then the result of the equation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif (0 || GPR[rs]31..0) x (0 || GPR[rt]31..0) prod0 (0 || GPR[rs]95..64) x (0 || GPR[rt]95..64) prod1 (prod0 31)32 || prod031..0 LO63..0 (prod0 63)32 || prod063..32 HI63..0 (prod1 31)32 || prod131..0 LO127..64 (prod1 63)32 || prod163..32 HI127..64 GPR[rd]63..0 prod0 GPR[rd]127..64 prod1
127 96 95 64 63 32 31 0
rs
127
A3
96 95
A2
A1
64 63 32 31
A0
x
B2
x
B0
0
rt
B3
B1
127
64 63
0
rd
127
(0 || A2) x (0 || B2)
96 95 64 63
(0 || A0) x (0 || B0)
32 31 0
HI
127
sign ext sign ext
((0 || A2) x (0 || B2)) 63..32
96 95 64 63
sign ext sign ext
((0 || A0) x (0 || B0)) 63..32
32 31 0
LO
(0 || A2) x (0 || B2) 31..0
((0 || A0) x (0 || B0)) 31..0
B-120
Appendix B C790-Specific Instruction Set Details
Exceptions:
None
Programming Notes:
See the Programming Notes of the PMADDH instruction.
B-121
Appendix B C790-Specific Instruction Set Details
PMULTW
31 26 25 21 20
Parallel Multiply Word 16 15 11 10 65
PMULTW
0
MMI 011100
6
rs
5 5
rt
rd
5
PMULTW 01100
5
MMI2 001001
6 C790
Format: Purpose: Description:
PMULTW rd, rs, rt To multiply 2 pairs of 32-bit signed integers in parallel. (rd, LO, HI) rs x rt
The low-order signed words of the two doublewords in GPR rs are multiplied by the loworder signed words of the two doublewords in GPR rt in parallel. The low-order word of the two doubleword results is placed into special register LO, and the high-order word of the two doubleword results is placed into special register HI. The two doubleword results are placed into GPR rd. No arithmetic exception occurs under any circumstances. This instruction operates on 128-bit registers.
Restrictions:
If either GPR rt or GPR rs do not contain sign-extended 32-bit values (bits 127..95 and 63..31 equal) then the result of the equation will be undefined.
Operation:
if (NotWordValue (GPR[rs]) or NotWordValue (GPR[rt])) then UndefinedResult() endif GPR[rs]31..0 x GPR[rt]31..0 prod0 GPR[rs]95..64 x GPR[rt]95..64 prod1 (prod0 31)32 || prod031..0 LO63..0 (prod0 63)32 || prod063..32 HI63..0 (prod1 31)32 || prod131..0 LO127..64 (prod1 63)32 || prod163..32 HI127..64 GPR[rd]63..0 prod0 GPR[rd]127..64 prod1
127 96 95 64 63 32 31 0
rs
127
A3
96 95
A2
A1
64 63 32 31
A0
x
B2
x
B0
0
rt
127
B3
B1
64 63
0
rd
127
A2 x B2
96 95 64 63
A0 x B0
32 31 0
HI
127
sign ext
96 95
( A2 x B2 ) 63..32
64 63
sign ext
32 31
( A0 x B0 ) 63..32
0
LO
sign ext
( A2 x B2 ) 31..0
sign ext
( A0 x B0) 31..0
B-122
Appendix B C790-Specific Instruction Set Details
Exceptions:
None
Programming Notes:
See the Programming Notes of the PMADDH instruction.
B-123
Appendix B C790-Specific Instruction Set Details
PNOR
31 26 25 21 20
Parallel Not Or 16 15 11 10 65
PNOR
0
MMI 011100
6
rs
5
rt
5
rd
5
PNOR 10011
5
MMI3 101001
6
C790
Format: Purpose: Description:
PNOR rd, rs, rt To do a bitwise logical NOT OR (NOR). rd rs NOR rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical NOR operation. The result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]127..0 GPR[rs]127..0 nor GPR[rt]127..0
127 64 63 0
rs
127
A1 NOR
64 63
A0 NOR
0
rt
B1
B0
127
64 63
0
rd
A1 NOR B1
A0 NOR B0
Exceptions:
None
B-124
Appendix B C790-Specific Instruction Set Details
POR
31 26 25 21 20
Parallel Or 16 15 11 10 65 0
POR
POR 10010
5
MMI 011100
6
rs
5
rt
5
rd
5
MMI3 101001
6
C790
Format: Purpose: Description:
POR rd, rs, rt To do a bitwise logical OR. rd rs OR rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical OR operation. The result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]127..0 GPR[rs]127..0 or GPR[rt]127..0
127 64 63 0
rs
127
A1 OR
64 63
A0 OR
0
rt
B1
B0
127
64 63
0
rd
A1 OR B1
A0 OR B0
Exceptions:
None
B-125
Appendix B C790-Specific Instruction Set Details
PPAC5
31 26 25 21 20
Parallel Pack to 5-bits 16 15 11 10 65
PPAC5
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PPAC5 11111
5
MMI0 001000
6
C790
Format: Purpose: Description:
PPAC5 rd, rt To truncate and pack data into consecutive 5-bits. rd pack (rt)
The four 32-bit words (8, 8, 8, 8 bit) in GPR rt are packed into the four 16-bit halfwords (1, 5, 5, 5 bit). The results are placed into GPR rd. See diagram on next page. This instruction operates on 128-bit registers.
Operation
GPR[rd]4..0 GPR[rd]9..5 GPR[rd]14..10 GPR[rd]15 GPR[rd]31..16 GPR[rd]36..32 GPR[rd]41..37 GPR[rd]46..42 GPR[rd]47 GPR[rd]63..48 GPR[rd]68..64 GPR[rd]73..69 GPR[rd]78..74 GPR[rd]79 GPR[rd]95..80 GPR[rd]100..96 GPR[rd]105..101 GPR[rd]110..106 GPR[rd]111 GPR[rd]127..112
GPR[rt]7..3 GPR[rt]15..11 GPR[rt]23..19 GPR[rt]31 016 GPR[rt]39..35 GPR[rt]47..43 GPR[rt]55..51 GPR[rt]63 016 GPR[rt]71..67 GPR[rt]79..75 GPR[rt]87..83 GPR[rt]95 016 GPR[rt]103..99 GPR[rt]111..107 GPR[rt]119..115 GPR[rt]127 016
B-126
Appendix B C790-Specific Instruction Set Details
[Overview]
127 96 95 64 63 32 31 0
rt
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
[Detail of word region (31..0)]
31 30 24 23 19 18 16 15 11 10 87
Zoom
32 0
rt
A3 8bit
31
A2 8bit
16 15
A1 8bit
14 10 9
A0 8bit
54 0
rd
0
16
A3 1bit
A2 5bit
A1 5bit
A0 5bit
Exceptions:
None
B-127
Appendix B C790-Specific Instruction Set Details
PPACB
31 26 25 21 20
Parallel Pack to Byte 16 15 11 10 65
PPACB
0
MMI 011100
6
rs
5
rt
5
rd
5
PPACB 11011
5
MMI0 001000
6
C790
Format: Purpose: Description:
PPACB rd, rs, rt To pack into consecutive bytes. rd pack (rs, rt)
The low-order bytes of the eight halfwords in GPR rs are packed into consecutive bytes of the high-order doubleword in GPR rd. Similarly, the low-order bytes of the eight halfwords in GPR rt are packed into consecutive bytes of the low-order doubleword in GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]7..0 GPR[rd]15..8 GPR[rd]23..16 GPR[rd]31..24 GPR[rd]39..32 GPR[rd]47..40 GPR[rd]55..48 GPR[rd]63..56 GPR[rd]71..64 GPR[rd]79..72 GPR[rd]87..80 GPR[rd]95..88 GPR[rd]103..96 GPR[rd]111..104 GPR[rd]119..112 GPR[rd]127..120
127
GPR[rt]7..0 GPR[rt]23..16 GPR[rt]39..32 GPR[rt]55..48 GPR[rt]71..64 GPR[rt]87..80 GPR[rt]103..96 GPR[rt]119..112 GPR[rs]7..0 GPR[rs]23..16 GPR[rs]39..32 GPR[rs]55..48 GPR[rs]71..64 GPR[rs]87..80 GPR[rs]103..96 GPR[rs]119..112
88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23 16 15 87 0
120 119 112 111 104 103 96 95
rs
A7
A6
A5
A4
A3
A2
A1
A0
127
120 119 112 111 104 103 96 95
88 87 80 79
72 71
64 63 56 55 48 47
40 39
32 31
24 23
16 15
87
0
rd
A7
A6
A5
A4
A3
A2
A1
A0
B7
B6
B5
B4
B3
B2
B1
B0
127
120 119 112 111 104 103 96 95
88 87 80 79
72 71
64 63 56 55 48 47
40 39
32 31
24 23
16 15
87
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
Exceptions:
None
B-128
Appendix B C790-Specific Instruction Set Details
PPACH
31 26 25 21 20
Parallel Pack to Halfword 16 15 11 10 65
PPACH
0
MMI 011100
6
rs
5
rt
5
rd
5
PPACH 10111
5
MMI0 001000
6
C790
Format: Purpose: Description:
PPACH rd, rs, rt To pack into consecutive halfwords. rd pack (rs, rt)
The low-order halfwords of the four words in GPR rs are packed into consecutive halfwords of the high-order doubleword in GPR rd. Similarly, the low-order halfwords of the four words in GPR rt are packed into consecutive halfwords of the low-order doubleword in GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
GPR[rt]15..0 GPR[rt]47..32 GPR[rt]79..64 GPR[rt]111..96 GPR[rs]15..0 GPR[rs]47..32 GPR[rs]79..64 GPR[rs]111..96
96 95 80 79 64 63 48 47 32 31 16 15 0
rs
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
127
A3
112 111
A2
96 95
A1
80 79
A0
64 63
B3
48 47
B2
32 31
B1
16 15
B0
0
rt
B3
B2
B1
B0
Exceptions:
None
B-129
Appendix B C790-Specific Instruction Set Details
PPACW
31 26 25 21 20
Parallel Pack to Word 16 15 11 10 65
PPACW
0
MMI 011100
6
rs
5
rt
5
rd
5
PPACW 10011
5
MMI0 001000
6
C790
Format: Purpose: Description:
PPACW rd, rs, rt To pack into consecutive words. rd pack (rs, rt)
The low-order words of the two doublewords in GPR rs are packed into consecutive words of the high-order doubleword in GPR rd. Similarly, the low-order words of the two doublewords in GPR rt are packed into consecutive words of the low-order doubleword in GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
GPR[rt]31..0 GPR[rt]95..64 GPR[rs]31..0 GPR[rs]95..64
96 95 64 63 32 31 0
rs
127 96 95
A1
64 63 32 31
A0
0
rd
127
A1
96 95
A0
64 63
B1
32 31
B0
0
rt
B1
B0
Exceptions:
None
B-130
Appendix B C790-Specific Instruction Set Details
PREVH
31 26 25 21 20
Parallel Reverse Halfword 16 15 11 10 65
PREVH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PREVH 11011
5
MMI2 001001
6
C790
Format: Purpose: Description:
PREVH rd, rt To reverse halfwords. rd reverse (rt)
The four high-order halfwords in GPR rt are reversed and the four low-order halfwords in GPR rt are reversed. The results are placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rt]63..48 GPR[rd]31..16 GPR[rt]47..32 GPR[rd]47..32 GPR[rt]31..16 GPR[rd]63..48 GPR[rt]15..0 GPR[rd]79..64 GPR[rt]127..112 GPR[rd]95..80 GPR[rt]111..96 GPR[rd]111..96 GPR[rt]95..80 GPR[rd]127..112 GPR[rt]79..64
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rt
A7
A6
A5
A4
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A4
A5
A6
A7
A0
A1
A2
A3
Exceptions:
None
B-131
Appendix B C790-Specific Instruction Set Details
PROT3W
31 26 25 21 20
Parallel Rotate 3 Words Left 16 15 11 10 65
PROT3W
0
MMI 011100
6
0 00000
5
rt
5
rd
5
PROT3W 11111
5
MMI2 001001
6
C790 Format: Purpose: Description: PROT3W rd, rt To rotate words. rd rotate (rt)
The three low-order words in GPR rt are rotated to the right. The results are placed into GPR rd while the other word is copied directly to the corresponding word in GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
GPR[rt]63..32 GPR[rt]95..64 GPR[rt]31..0 GPR[rt]127..96
96 95 64 63 32 31 0
rt
A3
A2
A1
A0
127
96 95
64 63
32 31
0
rd
A3
A0
A2
A1
Exceptions:
None
B-132
Appendix B C790-Specific Instruction Set Details
PSLLH
31 26 25
Parallel Shift Left Logical Halfword 21 20 16 15 11 10 65
PSLLH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
sa
5
PSLLH 110100
6
C790
Format: Purpose: Description:
PSLLH rd, rt, sa To logically shift left 8 halfwords by a fixed number of bits, in parallel. rd rt << sa (logical)
The eight halfwords in GPR rt are shifted left in parallel, inserting zeros into the emptied bits; the results are placed into the corresponding eight halfwords in GPR rd. The bit shift count is specified by the low-order four bits of sa. This instruction operates on 128-bit registers.
Operation:
s sa3..0 GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
GPR[rt](15-s)..0 || 0s GPR[rt](31-s)..16 || 0s GPR[rt](47-s)..32 || 0s GPR[rt](63-s)..48 || 0s GPR[rt](79-s)..64 || 0s GPR[rt](95-s)..80 || 0s GPR[rt](111-s)..96 || 0s GPR[rt](127-s)..112 || 0s
96 95 80 79 64 63 48 47 32 31 16 15 0
rt
A7
A6
A5
A4
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7
0
s
A6
0
s
A5
0
s
A4
0
s
A3
0
s
A2
0
s
A1
0
s
A0
0
s
s bit
s bit
s bit
s bit
s bit
s bit
s bit
s bit
Exceptions:
None
B-133
Appendix B C790-Specific Instruction Set Details
PSLLVW
31 26 25
Parallel Shift Left Logical Variable Word 21 20 16 15 11 10 65
PSLLVW
0
MMI 011100
6
rs
5
rt
5
rd
5
PSLLVW 00010
5
MMI2 001001
6
C790
Format: Purpose: Description:
PSLLVW rd, rt, rs To logically shift left 2 words by a variable number of bits, in parallel. rd rt << rs (logical)
The low-order words of the two doublewords in GPR rt are shifted left in parallel, inserting zeros into the emptied bits; the results are placed into the corresponding two words in GPR rd. The bit shift counts are specified by the low-order five bits of the two doublewords in GPR rs. This instruction operates on 128-bit registers.
Operation:
s0 s1 temp0 temp1 GPR[rd]63..0 GPR[rd]127..64
127
GPR[rs]4..0 GPR[rs]68..64 GPR[rt](31-s0)..0 || 0s0 GPR[rt](95-s1)..64 || 0s1 (temp031)32 || temp031..0 (temp131)32 || temp131..0
68 64 63 4 0
rs
127 96 95
s1
64 63 32 31
s0
0
rt
A1
A0
127
96 95
64 63
32 31
0
rd
sign ext
A1
0
s1
sign ext
A0
0
s0
s1 bit
s0 bit
Exceptions:
None
B-134
Appendix B C790-Specific Instruction Set Details
PSLLW
31 26 25 21 20
Parallel Shift Left Logical Word 16 15 11 10 65
PSLLW
0
MMI 011100
6
0 00000
5
rt
5
rd
5
sa
5
PSLLW 111100
6
C790
Format: Purpose: Description:
PSLLW rd, rt, sa To logically shift left 4 words by a fixed number of bits, in parallel. rd rt << sa (logical)
The four words in GPR rt are shifted left by five bits of sa in parallel, inserting zeros into the emptied bits; the results are placed into the corresponding four words in GPR rd. This instruction operates on 128-bit registers.
Operation:
s GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
sa4..0 GPR[rt](31-s)..0 || 0s GPR[rt](63-s)..32 || 0s GPR[rt](95-s)..64 || 0s GPR[rt](127-s)..96 || 0s
96 95 64 63 32 31 0
rt
A3
A2
A1
A0
127
96 95
64 63
32 31
0
rd
A3
0
s
A2
0
s
A1
0
s
A0
0
s
s bit
s bit
s bit
s bit
Exceptions:
None
B-135
Appendix B C790-Specific Instruction Set Details
PSRAH
31 26 25
Parallel Shift Right Arithmetic Halfword 21 20 16 15 11 10 65
PSRAH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
sa
5
PSRAH 110111
6
C790
Format: Purpose: Description:
PSRAH rd, rt, sa To arithmetically shift right 8 halfwords by a fixed number of bits, in parallel. rd rt >> sa (arithmetic)
The eight halfwords in GPR rt are shifted right by sa bits in parallel sign extending the high order bits; the results are placed into the corresponding eight halfwords in GPR rd. The bit shift count is specified by the low-order four bits of sa. This instruction operates on 128-bit registers.
Operation:
s sa3..0 GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
(GPR[rt]15)s || GPR[rt]15..s (GPR[rt]31)s || GPR[rt]31..(16+s) (GPR[rt]47)s || GPR[rt]47..(32+s) (GPR[rt]63)s || GPR[rt]63..(48+s) (GPR[rt]79)s || GPR[rt]79..(64+s) (GPR[rt]95)s || GPR[rt]95..(80+s) (GPR[rt]111)s || GPR[rt]111..(96+s) (GPR[rt]127)s || GPR[rt]127..(112+s)
96 95 80 79 64 63 48 47 32 31 16 15 0
rt
A7
A6
A5
A4
A3
A2
A1
A0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
sign ext
A7
sign ext
A6
sign ext
A5
sign ext
A4
sign ext
A3
sign ext
A2
sign ext
A1
sign ext
A0
s bit
s bit
s bit
s bit
s bit
s bit
s bit
s bit
Exceptions:
None
B-136
Appendix B C790-Specific Instruction Set Details
PSRAVW
31 26 25
Parallel Shift Right Arithmetic Variable Word 21 20 16 15 11 10 65
PSRAVW
0
MMI 011100
6
rs
5
rt
5
rd
5
PSRAVW 00011
5
MMI3 101001
6
C790
Format: Purpose: Description:
PSRAVW rd, rt, rs To arithmetically shift right 2 words by a variable number of bits, in parallel. rd rt >> rs (arithmetic)
The low-order words of the two doublewords in GPR rt are shifted right in parallel, sign extending the high order bits; the results are placed into the corresponding two words in GPR rd. The bit shift counts are specified by the low-order five bits of the two doublewords in GPR rs. This instruction operates on 128-bit registers.
Operation:
s0 GPR[rs]4..0 s1 GPR[rs]68..64 temp0 (GPR[rt]31)s0 || GPR[rt]31..s0 temp1 (GPR[rt]95)s1 || GPR[rt]95..(64+s1) GPR[rd]63..0 (temp031)32 || temp031..0 GPR[rd]127..64 (temp131)32 || temp131..0
127 68 64 63 4 0
rs
127 96 95
s1
64 63 32 31
s0
0
rt
A1
A0
127
96 95
64 63
32 31
0
rd
sign ext
sign ext s1 bit
A1
sign ext
sign ext s0 bit
A0
Exceptions:
None
B-137
Appendix B C790-Specific Instruction Set Details
PSRAW
31 26 25
Parallel Shift Right Arithmetic Word 21 20 16 15 11 10 65
PSRAW
0
MMI 011100
6
0 00000
5
rt
5
rd
5
sa
5
PSRAW 111111
6
C790
Format: Purpose: Description:
PSRAW rd, rt, sa To arithmetically shift right 4 word by a fixed number of bits, in parallel. rd rt >> sa (arithmetic)
The four words in GPR rt are shifted right by five bits of sa in parallel, sign extending the high order bits; the results are placed into the corresponding four words in GPR rd. This instruction operates on 128-bit registers.
Operation:
s GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
sa4..0 (GPR[rt]31)s || GPR[rt]31..s (GPR[rt]63)s || GPR[rt]63..(32+s) (GPR[rt]95)s || GPR[rt]95..(64+s) (GPR[rt]127)s || GPR[rt]127..(96+s)
96 95 64 63 32 31 0
rt
A3
A2
A1
A0
127
96 95
64 63
32 31
0
rd sign ext s bit
A3
sign ext s bit
A2
sign ext s bit
A1
sign ext s bit
A0
Exceptions:
None
B-138
Appendix B C790-Specific Instruction Set Details
PSRLH
31 26 25
Parallel Shift Right Logical Halfword 21 20 16 15 11 10 65
PSRLH
0
MMI 011100
6
0 00000
5
rt
5
rd
5
sa
5
PSRLH 110110
6
C790
Format: Purpose: Description:
PSRLH rd, rt, sa To logically shift right 8 halfwords by a fixed number of bits, in parallel. rd rt >> sa (logical)
The eight halfwords in GPR rt are shifted right by sa bits, in parallel, inserting zeros into the high order bits; the results are placed into the corresponding eight halfwords in GPR rd. The bit shift count is specified by the low-order four bits of sa. This instruction operates on 128-bit registers.
Operation:
s sa3..0 GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
0s || GPR[rt]15..s 0s || GPR[rt]31..(16+s) 0s || GPR[rt]47..(32+s) 0s || GPR[rt]63..(48+s) 0s || GPR[rt]79..(64+s) 0s || GPR[rt]95..(80+s) 0s || GPR[rt]111..(96+s) 0s || GPR[rt]127..(112+s)
96 95 80 79 64 63 48 47 32 31 16 15 0
rt
A7
A6
A5
A4
A3
A2
A1
A0
127
112 111
s
96 95
80 79
s
64 63
s
48 47
s
32 31
s
16 15
s
0
s
rd
0
A7
0
s
A6
0
A5
0
A4
0
A3
0
A2
0
A1
0
A0
s bit
s bit
s bit
s bit
s bit
s bit
s bit
s bit
Exceptions:
None
B-139
Appendix B C790-Specific Instruction Set Details
PSRLVW
31 26 25
Parallel Shift Right Logical Variable Word 21 20 16 15 11 10 65
PSRLVW
0
MMI 011100
6
rs
5
rt
5
rd
5
PSRLVW 00011
5
MMI2 001001
6
C790
Format: Purpose: Description:
PSRLVW rd, rt, rs To logically shift right 2 words by a variable number of bits, in parallel. rd rt >> rs (logical)
The low-order words of the two doublewords in GPR rt are shifted right in parallel, inserting zeros into the high order bits. The results are sign extended; the results are placed into the corresponding two words in GPR rd. The bit shift counts are specified by the low-order five bits of the two doublewords in GPR rs. This instruction operates on 128-bit registers.
Operation:
s0 GPR[rs]4..0 s1 GPR[rs]68..64 temp0 0s0 || GPR[rt]31..s0 temp1 0s1 || GPR[rt]95..(64+s1) GPR[rd]63..0 (temp031)32 || temp0 31..0 GPR[rd]127..64 (temp131)32 || temp1 31..0
127 68 64 63 4 0
rs
127 96 95
s1
64 63 32 31
s0
0
rt
A1
A0
127
96 95
64 63
32 31
0
s0
rd
sign ext
0
s1
A1
sign ext
0
A0
s1 bit
s0 bit
Exceptions:
None
B-140
Appendix B C790-Specific Instruction Set Details
PSRLW
31 26 25 21 20
Parallel Shift Right Logical Word 16 15 11 10 65
PSRLW
0
MMI 011100
6
0 00000
5
rt
5
rd
5
sa
5
PSRLW 111110
6
C790
Format: Purpose: Description:
PSRLW rd, rt, sa To logically shift right 4 words by a fixed number of bits, in parallel. rd rt >> sa (logical)
The four words in GPR rt are shifted right by five bits of sa, in parallel, inserting zeros into the high order bits; the results are placed into the corresponding four words in GPR rd. This instruction operates on 128-bit registers.
Operation:
s GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
sa4..0 0s || GPR[rt]31..s 0s || GPR[rt]63..(32+s) 0s || GPR[rt]95..(64+s) 0s || GPR[rt]127..(96+s)
96 95 64 63 32 31 0
rt
A3
A2
A1
A0
127
96 95
64 63
32 31
0
rd
0
s
A3
0
s
A2
0
s
A1
0
s
A0
s bit
s bit
s bit
s bit
Exceptions:
None
B-141
Appendix B C790-Specific Instruction Set Details
PSUBB
31 26 25 21 20
Parallel Subtract Byte 16 15 11 10 65
PSUBB
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBB 01001
5
MMI0 001000
6
C790
Format: Purpose: Description:
PSUBB rd, rs, rt To subtract 16 pairs of 8-bit integers in parallel. rd rs - rt
The sixteen signed byte values in GPR rt are subtracted from the corresponding sixteen byte values in GPR rs in parallel. The results are placed into the corresponding sixteen bytes in GPR rd. No overflow or underflow exceptions are generated under any circumstances. This instruction operates on 128-bit registers.
Operation:
GPR[rd]7..0 GPR[rd]15..8 GPR[rd]23..16 GPR[rd]31..24 GPR[rd]39..32 GPR[rd]47..40 GPR[rd]55..48 GPR[rd]63..56 GPR[rd]71..64 GPR[rd]79..72 GPR[rd]87..80 GPR[rd]95..88 GPR[rd]103..96 GPR[rd]111..104 GPR[rd]119..112 GPR[rd]127..120
127
(GPR[rs]7..0 - GPR[rt]7..0)7..0 (GPR[rs]15..8 - GPR[rt]15..8)7..0 (GPR[rs]23..16 - GPR[rt]23..16)7..0 (GPR[rs]31..24 - GPR[rt]31..24)7..0 (GPR[rs]39..32 - GPR[rt]39..32)7..0 (GPR[rs]47..40 - GPR[rt]47..40)7..0 (GPR[rs]55..48 - GPR[rt]55..48)7..0 (GPR[rs]63..56 - GPR[rt]63..56)7..0 (GPR[rs]71..64 - GPR[rt]71..64)7..0 (GPR[rs]79..72 - GPR[rt]79..72)7..0 (GPR[rs]87..80 - GPR[rt]87..80)7..0 (GPR[rs]95..88 - GPR[rt]95..88)7..0 (GPR[rs]103..96 - GPR[rt]103..96)7..0 (GPR[rs]111..104 - GPR[rt]111..104)7..0 (GPR[rs]119..112 - GPR[rt]119..112)7..0 (GPR[rs]127..120 - GPR[rt]127..120)7..0
48 47 40 39 32 31 24 23 16 15 87 0
120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55
rs A15
A14
A13 A12
A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
87
-
127
120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55
-
-
-
-
-
-
-
-
-
48 47 40 39
-
-
32 31 24 23 16 15
-
-
-
-
B0
0
rt B15
B14
B13
B12
B11
B10
B9
B8
B7
B6
B5
B4
B3
B2
B1
127
120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55
A15 - B15 A14 - B14 A13 - B13 A12 - B12 A11 - B11 A10 - B10 A9 - B9 A8 - B8 A7 - B7
48 47 40 39
A6 - B6 A5 - B5 A4 - B4
32 31 24 23 16 15
A3 - B3 A2 - B2 A1 - B1
87
A0 - B0
0
rd
Exceptions:
None B-142
Appendix B C790-Specific Instruction Set Details
PSUBH
31 26 25 21 20
Parallel Subtract Halfword 16 15 11 10 65
PSUBH
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBH 00101
5
MMI0 001000
6
C790
Format: Purpose: Description:
PSUBH rd, rs, rt To subtract 8 pairs of 16-bit integers in parallel. rd rs - rt
The eight signed halfwords in GPR rt are subtracted from the corresponding eight halfwords in GPR rs in parallel. The results are placed into the corresponding eight halfwords in GPR rd. No overflow or underflow exceptions are generated under any circumstances. This instruction operates on 128-bit registers.
Operation:
GPR[rd]15..0 GPR[rd]31..16 GPR[rd]47..32 GPR[rd]63..48 GPR[rd]79..64 GPR[rd]95..80 GPR[rd]111..96 GPR[rd]127..112
127 112 111
(GPR[rs]15..0 - GPR[rt]15..0)15..0 (GPR[rs]31..16 - GPR[rt]31..16)15..0 (GPR[rs]47..32 - GPR[rt]47..32)15..0 (GPR[rs]63..48 - GPR[rt]63..48)15..0 (GPR[rs]79..64 - GPR[rt]79..64)15..0 (GPR[rs]95..80 - GPR[rt]95..80)15..0 (GPR[rs]111..96 - GPR[rt]111..96)15..0 (GPR[rs]127..112 - GPR[rt]127..112)15..0
96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 -
112 111
A6 -
96 95
A5 -
80 79
A4 -
64 63
A3 -
48 47
A2 -
32 31
A1 -
16 15
A0 -
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7-B7
A6-B6
A5-B5
A4-B4
A3-B3
A2-B2
A1-B1
A0-B0
Exceptions:
None
B-143
Appendix B C790-Specific Instruction Set Details
PSUBSB
31 26 25
Parallel Subtract with Signed saturation Byte 21 20 16 15 11 10 65
PSUBSB
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBSB 11001
5
MMI0 001000
6
C790
Format: Purpose: Description:
PSUBSB rd, rs, rt To subtract 16 pairs of 8-bit signed integers with saturation in parallel. rd rs - rt
The sixteen signed bytes in GPR rt are subtracted from the corresponding sixteen signed bytes in GPR rs in parallel. The results are placed into the corresponding sixteen bytes in GPR rd. No overflow or underflow exceptions are generated under any circumstances. Results beyond the range of a signed byte value are saturated according to the following: Overflow: Underflow: 0x7F 0x80
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]7..0 - GPR[rt]7..0) > 0x7F) then GPR[rd]7..0 0x7F else if (0x100 <= (GPR[rs]7..0 - GPR[rt]7..0) < 0x180) then GPR[rd]7..0 0x80 else GPR[rd]7..0 (GPR[rs]7..0 - GPR[rt]7..0)7..0 endif if ((GPR[rs]15..8 - GPR[rt]15..8) > 0x7F) then 0x7F GPR[rd]15..8 else if (0x100 <= (GPR[rs]15..8 - GPR[rt]15..8) < 0x180) then 0x80 GPR[rd]15..8 else (GPR[rs]15..8 - GPR[rt]15..8)7..0 GPR[rd]15..8 endif if ((GPR[rs]23..16 - GPR[rt]23..16) > 0x7F) then 0x7F GPR[rd]23..16 else if (0x100 <= (GPR[rs]23..16 - GPR[rt]23..16) < 0x180) then 0x80 GPR[rd]23..16 else (GPR[rs]23..16 - GPR[rt]23..16)7..0 GPR[rd]23..16 endif
B-144
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]31..24 - GPR[rt]31..24) > 0x7F) then 0x7F GPR[rd]31..24 else if (0x100 <= (GPR[rs]31..24 - GPR[rt]31..24) < 0x180) then 0x80 GPR[rd]31..24 else (GPR[rs]31..24 - GPR[rt]31..24)7..0 GPR[rd]31..24 endif if ((GPR[rs]39..32 - GPR[rt]39..32) > 0x7F) then 0x7F GPR[rd]39..32 else if (0x100 <= (GPR[rs]39..32 - GPR[rt]39..32) < 0x180) then 0x80 GPR[rd]39..32 else (GPR[rs]39..32 - GPR[rt]39..32)7..0 GPR[rd]39..32 endif if ((GPR[rs]47..40 - GPR[rt]47..40) > 0x7F) then 0x7F GPR[rd]47..40 else if (0x100 <= (GPR[rs]47..40 - GPR[rt]47..40) < 0x180) then 0x80 GPR[rd]47..40 else (GPR[rs]47..40 - GPR[rt]47..40)7..0 GPR[rd]47..40 endif if ((GPR[rs]55..48 - GPR[rt]55..48) > 0x7F) then 0x7F GPR[rd]55..48 else if (0x100 <= (GPR[rs]55..48 - GPR[rt]55..48) < 0x180) then 0x80 GPR[rd]55..48 else (GPR[rs]55..48 - GPR[rt]55..48)7..0 GPR[rd]55..48 endif if ((GPR[rs]63..56 - GPR[rt]63..56) > 0x7F) then 0x7F GPR[rd]63..56 else if (0x100 <= (GPR[rs]63..56 - GPR[rt]63..56) < 0x180) then 0x80 GPR[rd]63..56 else (GPR[rs]63..56 - GPR[rt]63..56)7..0 GPR[rd]63..56 endif if ((GPR[rs]71..64 - GPR[rt]71..64) > 0x7F) then 0x7F GPR[rd]71..64 else if (0x100 <= (GPR[rs]71..64 - GPR[rt]71..64) < 0x180) then 0x80 GPR[rd]71..64 else (GPR[rs]71..64 - GPR[rt]71..64)7..0 GPR[rd]71..64 endif
B-145
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]79..72 - GPR[rt]79..72) > 0x7F) then 0x7F GPR[rd]79..72 else if (0x100 <= (GPR[rs]79..72 - GPR[rt]79..72) < 0x180) then 0x80 GPR[rd]79..72 else (GPR[rs]79..72 - GPR[rt]79..72)7..0 GPR[rd]79..72 endif if ((GPR[rs]87..80 - GPR[rt]87..80) > 0x7F) then 0x7F GPR[rd]87..80 else if (0x100 <= (GPR[rs]87..80 - GPR[rt]87..80) < 0x180) then 0x80 GPR[rd]87..80 else (GPR[rs]87..80 - GPR[rt]87..80)7..0 GPR[rd]87..80 endif if ((GPR[rs]95..88 - GPR[rt]95..88) > 0x7F) then 0x7F GPR[rd]95..88 else if (0x100 <= (GPR[rs]95..88 - GPR[rt]95..88) < 0x180) then 0x80 GPR[rd]95..88 else (GPR[rs]95..88 - GPR[rt]95..88)7..0 GPR[rd]95..88 endif if ((GPR[rs]103..96 - GPR[rt]103..96) > 0x7F) then 0x7F GPR[rd]103..96 else if (0x100 <= (GPR[rs]103..96 - GPR[rt]103..96) < 0x180) then 0x80 GPR[rd]103..96 else (GPR[rs]103..96 - GPR[rt]103..96)7..0 GPR[rd]103..96 endif if ((GPR[rs]111..104 - GPR[rt]111..104) > 0x7F) then 0x7F GPR[rd]111..104 else if (0x100 <= (GPR[rs]111..104 - GPR[rt]111..104) < 0x180) then 0x80 GPR[rd]111..104 else (GPR[rs]111..104 - GPR[rt]111..104)7..0 GPR[rd]111..104 endif if ((GPR[rs]119..112 - GPR[rt]119..112) > 0x7F) then 0x7F GPR[rd]119..112 else if (0x100 <= (GPR[rs]119..112 - GPR[rt]119..112) < 0x180) then 0x80 GPR[rd]119..112 else (GPR[rs]119..112 - GPR[rt]119..112)7..0 GPR[rd]119..112 endif
B-146
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]127..120 - GPR[rt]127..120) > 0x7F) then 0x7F GPR[rd]127..120 else if (0x100 <= (GPR[rs]127..120 - GPR[rt]127..120) < 0x180) then 0x80 GPR[rd]127..120 else (GPR[rs]127..120 - GPR[rt]127..120)7..0 GPR[rd]127..120 endif
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
87
0
rs A15
A14
A13
A12 A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
87
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
-
-
-
-
-
-
-
-
-
-
-
-
-
-
16 15
-
-
B0
0
rt B15
B14 B13
B12 B11
B10
B9
B8
B7
B6
B5
B4
B3
B2
B1
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
A2 - B2 A1 - B1
87
A0 - B0
0
rd
A15 - B15
A14 - B14
A13 - B13
A12 - B12
A11 - B11
A10 - B10
A9 - B9
A8 - B8
A7 - B7
A6 - B6
A5 - B5
A4 - B4
A3 - B3
* Saturate to signed byte Exceptions:
None
B-147
Appendix B C790-Specific Instruction Set Details
PSUBSH
31 26 25
Parallel Subtract with Signed Saturation Halfword 21 20 16 15 11 10 65
PSUBSH
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBSH 10101
5
MMI0 001000
6
C790 Format: Purpose: Description: PSUBSH rd, rs, rt To subtract 8 pairs of 16-bit signed integers with saturation in parallel. rd rs - rt
The eight signed halfwords in GPR rt are subtracted from the corresponding eight signed halfwords in GPR rs in parallel. The results are placed into the corresponding eight halfwords in GPR rd. No overflow or underflow exceptions are generated under any circumstances. Results beyond the range of a signed halfword value are saturated according to the following: Overflow: Underflow: 0x7FFF 0x8000
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]15..0 - GPR[rt]15..0) > 0x7FFF) then 0x7FFF GPR[rd]15..0 else if (0x10000 <= (GPR[rs]15..0 - GPR[rt]15..0) < 0x18000) then 0x8000 GPR[rd]15..0 else (GPR[rs]15..0 - GPR[rt]15..0)15..0 GPR[rd]15..0 endif if ((GPR[rs]31..16 - GPR[rt]31..16) > 0x7FFF) then 0x7FFF GPR[rd]31..16 else if (0x10000 <= (GPR[rs]31..16 - GPR[rt]31..16) < 0x18000) then 0x8000 GPR[rd]31..16 else (GPR[rs]31..16 - GPR[rt]31..16)15..0 GPR[rd]31..16 endif if ((GPR[rs]47..32 - GPR[rt]47..32) > 0x7FFF) then 0x7FFF GPR[rd]47..32 else if (0x10000 <= (GPR[rs]47..32 - GPR[rt]47..32) < 0x18000) then 0x8000 GPR[rd]47..32 else (GPR[rs]47..32 - GPR[rt]47..32)15..0 GPR[rd]47..32 endif if ((GPR[rs]63..48 - GPR[rt]63..48) > 0x7FFF) then 0x7FFF GPR[rd]63..48 else if (0x10000 <= (GPR[rs]63..48 - GPR[rt]63..48) < 0x18000) then
B-148
Appendix B C790-Specific Instruction Set Details
GPR[rd]63..48 else GPR[rd]63..48 endif 0x8000 (GPR[rs]63..48 - GPR[rt]63..48)15..0
if ((GPR[rs]79..64 - GPR[rt]79..64) > 0x7FFF) then 0x7FFF GPR[rd]79..64 else if (0x10000 <= (GPR[rs]79..64 - GPR[rt]79..64) < 0x18000) then 0x8000 GPR[rd]79..64 else (GPR[rs]79..64 - GPR[rt]79..64)15..0 GPR[rd]79..64 endif if ((GPR[rs]95..80 - GPR[rt]95..80) > 0x7FFF) then 0x7FFF GPR[rd]95..80 else if (0x10000 <= (GPR[rs]95..80 - GPR[rt]95..80) < 0x18000) then 0x8000 GPR[rd]95..80 else (GPR[rs]95..80 - GPR[rt]95..80)15..0 GPR[rd]95..80 endif if ((GPR[rs]111..96 - GPR[rt]111..96) > 0x7FFF) then 0x7FFF GPR[rd]111..96 else if (0x10000 <= (GPR[rs]111..96 - GPR[rt]111..96) < 0x18000) then 0x8000 GPR[rd]111..96 else (GPR[rs]111..96 - GPR[rt]111..96)15..0 GPR[rd]111..96 endif if ((GPR[rs]127..112 - GPR[rt]127..112) > 0x7FFF) then 0x7FFF GPR[rd]127..112 else if (0x10000 <= (GPR[rs]127..112 - GPR[rt]127..112) < 0x18000) then 0x8000 GPR[rd]127..112 else (GPR[rs]127..112 - GPR[rt]127..112)15..0 GPR[rd]127..112 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 -
112 111
A6 -
96 95
A5 -
80 79
A4 -
64 63
A3 -
48 47
A2 -
32 31
A1 -
16 15
A0 -
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7-B7
A6-B6
A5-B5
A4-B4
A3-B3
A2-B2
A1-B1
A0-B0
* Saturate to signed halfword Exceptions:
None B-149
Appendix B C790-Specific Instruction Set Details
PSUBSW
31 26 25
Parallel Subtract with Signed Saturation Word 21 20 16 15 11 10 65
PSUBSW
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBSW 10001
5
MMI0 001000
6
C790 Format: Purpose: Description: PSUBSW rd, rs, rt To subtract 4 pairs of 32-bit signed integers with saturation in parallel. rd rs - rt
The four signed words in GPR rt are subtracted from the corresponding four signed words in GPR rs in parallel. The results are placed into the corresponding four words in GPR rd. No overflow or underflow exceptions are generated under any circumstances. Results beyond the range of a signed word value are saturated according to the following: Overflow: Underflow: 0x7FFFFFFF 0x80000000
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]31..0 - GPR[rt]31..0) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]31..0 else if (0x100000000 <= (GPR[rs]31..0 - GPR[rt]31..0) < 0x180000000) then 0x80000000 GPR[rd]31..0 else (GPR[rs]31..0 - GPR[rt]31..0)31..0 GPR[rd]31..0 endif if ((GPR[rs]63..32 - GPR[rt]63..32) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]63..32 else if (0x100000000 <= (GPR[rs]63..32 - GPR[rt]63..32) < 0x180000000) then 0x80000000 GPR[rd]63..32 else (GPR[rs]63..32 - GPR[rt]63..32)31..0 GPR[rd]63..32 endif if ((GPR[rs]95..64 - GPR[rt]95..64) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]95..64 else if (0x100000000 <= (GPR[rs]95..64 - GPR[rt]95..64) < 0x180000000) then 0x80000000 GPR[rd]95..64 else (GPR[rs]95..64 - GPR[rt]95..64)31..0 GPR[rd]95..64 endif
B-150
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]127..96 - GPR[rt]127..96) > 0x7FFFFFFF) then 0x7FFFFFFF GPR[rd]127..96 else if (0x100000000 <= (GPR[rs]127..96 - GPR[rt]127..96) < 0x180000000) then 0x80000000 GPR[rd]127..96 else (GPR[rs]127..96 - GPR[rt]127..96)31..0 GPR[rd]127..96 endif
127 96 95 64 63 32 31 0
rs
127
A3
A2
96 95
A1
64 63
A0
32 31
-
B3
-
B2
-
B1
-
B0
0
rt
127
96 95
64 63
32 31
0
rd
A3-B3
A2-B2
A1-B1
A0-B0
* Saturate to signed word Exceptions:
None
B-151
Appendix B C790-Specific Instruction Set Details
PSUBUB
31 26 25
Parallel Subtract with Unsigned Saturation Byte 21 20 16 15 11 10 65
PSUBUB
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBUB 11001
5
MMI1 101000
6
C790 Format: Purpose: Description: PSUBUB rd, rs, rt To subtract 16 pairs of 8-bit unsigned integers with saturation in parallel. rd rs - rt
The sixteen unsigned bytes in GPR rt are subtracted from the corresponding sixteen unsigned bytes in GPR rs in parallel. The results are placed into the corresponding sixteen bytes in GPR rd. No underflow exceptions are generated under any circumstances. Results beyond the range of an unsigned byte value are saturated according to the following: Underflow: 0x00
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]7..0 - GPR[rt]7..0) < 0x00) then GPR[rd]7..0 0x00 else GPR[rd]7..0 (GPR[rs]7..0 - GPR[rt]7..0)7..0 endif if ((GPR[rs]15..8 - GPR[rt]15..8) < 0x00) then GPR[rd]15..8 0x00 else GPR[rd]15..8 (GPR[rs]15..8 - GPR[rt]15..8)7..0 endif if ((GPR[rs]23..16 - GPR[rt]23..16) < 0x00) then 0x00 GPR[rd]23..16 else (GPR[rs]23..16 - GPR[rt]23..16)7..0 GPR[rd]23..16 endif if ((GPR[rs]31..24 - GPR[rt]31..24) < 0x00) then 0x00 GPR[rd]31..24 else (GPR[rs]31..24 - GPR[rt]31..24)7..0 GPR[rd]31..24 endif if ((GPR[rs]39..32 - GPR[rt]39..32) < 0x00) then 0x00 GPR[rd]39..32 else (GPR[rs]39..32 - GPR[rt]39..32)7..0 GPR[rd]39..32 endif
B-152
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]47..40 - GPR[rt]47..40) < 0x00) then 0x00 GPR[rd]47..40 else (GPR[rs]47..40 - GPR[rt]47..40)7..0 GPR[rd]47..40 endif if ((GPR[rs]55..48 - GPR[rt]55..48) < 0x00) then 0x00 GPR[rd]55..48 else (GPR[rs]55..48 - GPR[rt]55..48)7..0 GPR[rd]55..48 endif if ((GPR[rs]63..56 - GPR[rt]63..56) < 0x00) then 0x00 GPR[rd]63..56 else (GPR[rs]63..56 - GPR[rt]63..56)7..0 GPR[rd]63..56 endif if ((GPR[rs]71..64 - GPR[rt]71..64) < 0x00) then 0x00 GPR[rd]71..64 else (GPR[rs]71..64 - GPR[rt]71..64)7..0 GPR[rd]71..64 endif if ((GPR[rs]79..72 - GPR[rt]79..72) < 0x00) then 0x00 GPR[rd]79..72 else (GPR[rs]79..72 - GPR[rt]79..72)7..0 GPR[rd]79..72 endif if ((GPR[rs]87..80 - GPR[rt]87..80) < 0x00) then 0x00 GPR[rd]87..80 else (GPR[rs]87..80 - GPR[rt]87..80)7..0 GPR[rd]87..80 endif if ((GPR[rs]95..88 - GPR[rt]95..88) < 0x00) then 0x00 GPR[rd]95..88 else (GPR[rs]95..88 - GPR[rt]95..88)7..0 GPR[rd]95..88 endif if ((GPR[rs]103..96 - GPR[rt]103..96) < 0x00) then 0x00 GPR[rd]103..96 else (GPR[rs]103..96 - GPR[rt]103..96)7..0 GPR[rd]103..96 endif if ((GPR[rs]111..104 - GPR[rt]111..104) < 0x00) then 0x00 GPR[rd]111..104 else (GPR[rs]111..104 - GPR[rt]111..104)7..0 GPR[rd]111..104 endif
B-153
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]119..112 - GPR[rt]119..112) < 0x00) then 0x00 GPR[rd]119..112 else (GPR[rs]119..112 - GPR[rt]119..112)7..0 GPR[rd]119..112 endif if ((GPR[rs]127..120 - GPR[rt]127..120) < 0x00) then 0x00 GPR[rd]127..120 else (GPR[rs]127..120 - GPR[rt]127..120)7..0 GPR[rd]127..120 endif
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
87
0
rs A15
A14
A13
A12 A11
A10
A9
A8
A7
A6
A5
A4
A3
A2
A1
A0
87
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
-
-
-
-
-
-
-
-
-
-
-
-
-
-
16 15
-
-
B0
0
rt B15
B14 B13
B12 B11
B10
B9
B8
B7
B6
B5
B4
B3
B2
B1
127 120 119 112 111 104 103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23
16 15
A2 - B2 A1 - B1
87
A0 - B0
0
rd
A15 - B15
A14 - B14
A13 - B13
A12 - B12
A11 - B11
A10 - B10
A9 - B9
A8 - B8
A7 - B7
A6 - B6
A5 - B5
A4 - B4
A3 - B3
* Saturate to unsigned byte
Exceptions:
None
B-154
Appendix B C790-Specific Instruction Set Details
PSUBUH
31 26 25
Parallel Subtract with Unsigned Saturation Halfword 21 20 16 15 11 10 65
PSUBUH
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBUH 10101
5
MMI1 101000
6
C790 Format: Purpose: Description: PSUBUH rd, rs, rt To subtract 8 pairs of 16-bit unsigned integers with saturation in parallel. rd rs - rt
The eight unsigned halfwords in GPR rt are subtracted from the corresponding eight unsigned halfwords in GPR rs in parallel. The results are placed into the corresponding eight halfwords in GPR rd. No underflow exceptions are generated under any circumstances. Results beyond the range of an unsigned halfword value are saturated according to the following: Underflow: 0x0000
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]15..0 - GPR[rt]15..0) < 0x0000) then 0x0000 GPR[rd]15..0 else (GPR[rs]15..0 - GPR[rt]15..0)15..0 GPR[rd]15..0 endif if ((GPR[rs]31..16 - GPR[rt]31..16) < 0x0000) then 0x0000 GPR[rd]31..16 else (GPR[rs]31..16 - GPR[rt]31..16)15..0 GPR[rd]31..16 endif if ((GPR[rs]47..32 - GPR[rt]47..32) < 0x0000) then 0x0000 GPR[rd]47..32 else (GPR[rs]47..32 - GPR[rt]47..32)15..0 GPR[rd]47..32 endif if ((GPR[rs]63..48 - GPR[rt]63..48) < 0x0000) then 0x0000 GPR[rd]63..48 else (GPR[rs]63..48 - GPR[rt]63..48)15..0 GPR[rd]63..48 endif if ((GPR[rs]79..64 - GPR[rt]79..64) < 0x0000) then 0x0000 GPR[rd]79..64 else (GPR[rs]79..64 - GPR[rt]79..64)15..0 GPR[rd]79..64 endif
B-155
Appendix B C790-Specific Instruction Set Details
if ((GPR[rs]95..80 - GPR[rt]95..80) < 0x0000) then 0x0000 GPR[rd]95..80 else (GPR[rs]95..80 - GPR[rt]95..80)15..0 GPR[rd]95..80 endif if ((GPR[rs]111..96 - GPR[rt]111..96) < 0x0000) then 0x0000 GPR[rd]111..96 else (GPR[rs]111..96 - GPR[rt]111..96)15..0 GPR[rd]111..96 endif if ((GPR[rs]127..112 - GPR[rt]127..112) < 0x0000) then 0x0000 GPR[rd]127..112 else (GPR[rs]127..112 - GPR[rt]127..112)15..0 GPR[rd]127..112 endif
127 112 111 96 95 80 79 64 63 48 47 32 31 16 15 0
rs
127
A7 -
112 111
A6 -
96 95
A5 -
80 79
A4 -
64 63
A3 -
48 47
A2 -
32 31
A1 -
16 15
A0 -
0
rt
B7
B6
B5
B4
B3
B2
B1
B0
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
rd
A7-B7
A6-B6
A5-B5
A4-B4
A3-B3
A2-B2
A1-B1
A0-B0
* Saturate to unsigned halfword
Exceptions:
None
B-156
Appendix B C790-Specific Instruction Set Details
PSUBUW
31 26 25
Parallel Subtract with Unsigned Saturation Word 21 20 16 15 11 10 65
PSUBUW
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBUW 10001
5
MMI1 101000
6
C790 Format: Purpose: Description: PSUBUW rd, rs, rt To subtract 4 pairs of 32-bit unsigned integers with saturation in parallel. rd rs - rt
The four unsigned words in GPR rt are subtracted from the corresponding four unsigned words in GPR rs in parallel. The results are placed into the corresponding four words in GPR rd. No underflow exceptions are generated under any circumstances. Results beyond the range of an unsigned word value are saturated according to the following: Underflow: 0x00000000
This instruction operates on 128-bit registers.
Operation:
if ((GPR[rs]31..0 - GPR[rt]31..0) < 0x00000000) then 0x00000000 GPR[rd]31..0 else (GPR[rs]31..0 - GPR[rt]31..0)31..0 GPR[rd]31..0 endif if ((GPR[rs]63..32 - GPR[rt]63..32) < 0x00000000) then 0x00000000 GPR[rd]63..32 else (GPR[rs]63..32 - GPR[rt]63..32)31..0 GPR[rd]63..32 endif if ((GPR[rs]95..64 - GPR[rt]95..64) < 0x00000000) then 0x00000000 GPR[rd]95..64 else (GPR[rs]95..64 - GPR[rt]95..64)31..0 GPR[rd]95..64 endif if ((GPR[rs]127..96 - GPR[rt]127..96) < 0x00000000) then 0x00000000 GPR[rd]127..96 else (GPR[rs]127..96 - GPR[rt]127..96)31..0 GPR[rd]127..96 endif
B-157
Appendix B C790-Specific Instruction Set Details
127 96 95 64 63 32 31 0
rs
127
A3
A2
96 95
A1
64 63
A0
32 31
-
B3
-
B2
-
B1
-
B0
0
rt
127
96 95
64 63
32 31
0
rd
A3-B3
A2-B2
A1-B1
A0-B0
* Saturate to Unsigned word
Exceptions:
None
B-158
Appendix B C790-Specific Instruction Set Details
PSUBW
31 26 25 21 20
Parallel Subtract Word 16 15 11 10 65
PSUBW
0
MMI 011100
6
rs
5
rt
5
rd
5
PSUBW 00001
5
MMI0 001000
6
C790
Format: Purpose: Description:
PSUBW rd, rs, rt To subtract 4 pairs of 32-bit integers in parallel. rd rs - rt
The four signed words in GPR rt are subtracted from the corresponding four words in GPR rs in parallel. The results are placed into the corresponding four words in GPR rd. No overflow or underflow exceptions are generated under any circumstances. This instruction operates on 128-bit registers.
Operation:
GPR[rd]31..0 GPR[rd]63..32 GPR[rd]95..64 GPR[rd]127..96
127
(GPR[rs]31..0 - GPR[rt]31..0)31..0 (GPR[rs]63..32 - GPR[rt]63..32)31..0 (GPR[rs]95..64 - GPR[rt]95..64)31..0 (GPR[rs]127..96 - GPR[rt]127..96)31..0
96 95 64 63 32 31 0
rs
127
A3
A2
96 95
A1
64 63
A0
32 31
-
B3
-
B2
-
B1
-
B0
0
rt
127
96 95
64 63
32 31
0
rd
A3-B3
A2-B2
A1-B1
A0-B0
Exceptions:
None
B-159
Appendix B C790-Specific Instruction Set Details
PXOR
31 26 25 21 20
Parallel Exclusive OR 16 15 11 10 65
PXOR
0
MMI 011100
6
rs
5
rt
5
rd
5
PXOR 10011
5
MMI2 001001
6
C790 Format: Purpose: Description: PXOR rd, rs, rt To do a bitwise logical EXCLUSIVE OR. rd rs XOR rt
The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical exclusive OR operation. The result is placed into GPR rd. This instruction operates on 128-bit registers.
Operation:
GPR[rd]127..0 GPR[rs]127..0 xor GPR[rt]127..0
127 64 63 0
rs
127
A1 XOR
64 63
A0 XOR
0
rt
B1
B0
127
64 63
0
rd
A1 XOR B1
A0 XOR B0
Exceptions:
None
B-160
Appendix B C790-Specific Instruction Set Details
QFSRV
31 26 25
Quadword Funnel Shift Right Variable 21 20 16 15 11 10 65
QFSRV
0
MMI 011100
6
rs
5
rt
5
rd
5
QFSRV 11011
5
MMI1 101000
6
C790
Format: Purpose: Description:
QFSRV rd, rs, rt To right shift a quadword by a variable number of bits. rd (rs, rt) >> SA
The content of GPR rt is concatenated with the content of GPR rs producing the intermediate result rs:rt. This value is shifted right by the number of bits specified in the shift amount register SA. The least significant 16 bytes (i.e. quadword) of the shifted result is placed into GPR rd.
Restriction:
Note that SA can be loaded only with byte shift values (MTSAB) or halfword shift values (MTSAH); i.e. with bit shift amounts that are multiples of 8 or 16. This instruction operates on 128-bit registers.
Operation:
if ( SA == 0 ) then GPR[rd]127..0 else GPR[rd]127..0 endif
Programming Note:
GPR[rt]127..0 GPR[rs](SA-1)..0 || GPR[rt]127..SA
1. A left funnel shift by an amount of s bytes can be done by setting SA to 16-s using the MTSAB instruction, provided that s is not 0. Similarly, a left funnel shift by s halfwords can be done by setting SA to 8-s using the MTSAH instruction, provided that s is not 0. A quick way to perform this computation is as follows:
// Register %sal contains the left shift amount subi %samt, %sal, 1 mtsab%samt, -1 // Following QFSRV does a shift left by %sal bytes qfsrv %dst, %src1, %src2
2. QFSRV can be used to rotate a 128-bit quantity r by setting both source operands rs and rt to register r. For example, the following code sequence rotates right the value in wide register %5 by 3 halfwords(i.e. 48 bits), and deposits the result in wide register %6.
mtsah qfsrv %0, 3 %6, %5, %5
B-161
Appendix B C790-Specific Instruction Set Details
SQ
31 26 25 21 20
Store Quadword 16 15 0
SQ
offset
16
C790
SQ 011111
6
base
5
rt
5
Format: Purpose: Description:
SQ rt, offset (base) To store a quadword to memory. memory [base + offset] rt
The 128-bit quadword in GPR rt is stored in memory at the location specified by the effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. The least significant four bits of the effective address are masked to zero (effectively creating an aligned address) before being used to access memory. No address exceptions due to alignment are possible.
Restrictions:
The effective address doesn't have to be naturally aligned. The least significant 4 bits of the effective address are ignored.
Operation:
vAddr sign_extend (offset) + GPR[base]31..0 vAddr3..0 = 04 (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) quadword GPR[rt]127..0 StoreMemory (uncached, QUADWORD, quadword, pAddr, vAddr, DATA)
Exceptions:
TLB Refill TLB Invalid Address Error
Programming Notes:
None
B-162
Appendix B C790-Specific Instruction Set Details
B.5 C790-Specific Instruction Encoding
31 26 0
OpCode
OpCode bits 28..26
bits 31..29 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 0 000 SPECIAL ADDI COP0 DADDI LB SB 1 001 REGIMM ADDIU COP1 DADDIU LH SH LWC1 SWC1
Instructions encoded by OpCode field (MMI, LQ, SQ)
2 010 J SLTI * LDL LWL SWL
3 011 JAL SLTIU * LDR LW SW PREF *
4 100 BEQ ANDI BEQL MMI LBU SDL
5 101 BNE ORI BNEL * LHU SDR LDC1 SDC1
6 110 BLEZ XORI BLEZL LQ LWR SWR
7 111 BGTZ LUI BGTZL SQ LWU CACHE LD SD
31 26 OpCode = MMI
5
0
function
function bits 2..0
bits 5..3 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 0 000 MADD MMI0 MFHI1 MULT1 MADD1 MMI1 PMFHL * 1 001 MADDU MMI2 MTHI1 MULTU1 MADDU1 MMI3 PMTHL *
Instructions encoded by function field when OpCode field = MMI
2 010 * * MFLO1 DIV1 * * * *
3 011 * * MTLO1 DIVU1 * * * *
4 100 PLZCW * * * * * PSLLH PSLLW
5 101 * * * * * * * *
6 110 * * * * * * PSRLH PSRLW
7 111 * * * * * * PSRAH PSRAW
B-163
Appendix B C790-Specific Instruction Set Details
31 26 OpCode = MMI 10 65 0
function
MMI0
function bits 7..6
bits 10..8 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 0 00 PADDW PADDH PADDB * 1 01 PSUBW PSUBH PSUBB *
Instructions encoded by function field when OpCode field = MMI & bit 5..0 = MMI0
2 10 PCGTW PCGTH PCGTB *
3 11 PMAXW PMAXH * * PPACW PPACH PPACB PPAC5
PADDSW PSUBSW PEXTLW PADDSH PADDSB * PSUBSH PSUBSB * PEXTLH PEXTLB PEXT5
31 26 OpCode = MMI
10
65
0
function
MMI1
function bits 7..6
bits 10..8 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 0 00 * PADSBH * * 1 01 PABSW PABSH * *
Instructions encoded by function field when OpCode field = MMI & bit 5..0 = MMI1
2 10 PCEQW PCEQH PCEQB *
3 11 PMINW PMINH * * * * QFSRV *
PADDUW PSUBUW PEXTUW PADDUH PSUBUH PADDUB * PSUBUB * PEXTUH PEXTUB *
B-164
Appendix B C790-Specific Instruction Set Details
31 26 OpCode = MMI 10 65 0
function
MMI2
function bits 7..6
bits 10..8 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 0 00 PMADDW PMSUBW PMFHI PMULTW 1 01 * * PMFLO PDIVW
Instructions encoded by function field when OpCode field = MMI & bit 5..0 = MMI2
2 10 PSLLVW * PINTH PCPYLD PAND * PEXEH PEXEW
3 11 PSRLVW * * * PXOR * PREVH PROT3W
PMADDH PHMADH PMSUBH PHMSBH * PMULTH * PDIVBW
31 26 OpCode = MMI
10
65
0
function
MMI3
function bits 7..6
bits 10..8 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 0 00
PMADDUW
Instructions encoded by function field when OpCode field = MMI & bit 5..0 = MMI3
1 01 * * PMTLO PDIVUW * * * *
2 10 * * PINTEH PCPYUD POR * PEXCH PEXCW
3 11 PSRAVW * * * PNOR * PCPYH *
* PMTHI
PMULTUW
* * * *
*
This OpCode is reserved for future use. An attempt to execute it causes a Reserved Instruction exception. This OpCode indicates an instruction class. The instruction word must be further decoded by examining additional tables that show the values for another instruction fields. This OpCode is reserved for one of the following instructions which are currently not supported: DMULT, DMULTU, DDIV, DDIVU, LL, LLD, SC, SCD, LWC2, SWC2. An attempt to execute it causes a Reserved Instruction exception.
B-165
Appendix B C790-Specific Instruction Set Details
B-166
Appendix C COP0 System Control Coprocessor Instruction Set Details
C. COP0 System Control Coprocessor Instruction Set Details
This appendix provides a detailed description of the operation of each System Control Coprocessor (COP0) instruction. COP0 instructions perform operations specifically on the System Control Coprocessor registers to manipulate the memory management and exception handing facilities of the processor. COP0 Coprocessor instructions are enabled if the processor is in Kernel mode, or if bit 28 (CU[0]) is set in the Status register. Otherwise, executing one of these instructions generates a Coprocessor Unusable exception. The only exception to this rule are the EI and the DI instructions which never generate Coprocessor Unusable exceptions. When the EDI bit in the Status register is set, the EI and DI instructions operate in User, Supervisor, and Kernel modes independent of whether COP0 coprocessor usable bit (Status.CU[0]) is set or not. When the EDI bit is cleared EI and DI work as NOPs in User and Supervisor modes independent of whether COP0 coprocessor usable bit (Status.CU[0]) is set or not, and executes properly in Kernel mode.
C-1
Appendix C COP0 System Control Coprocessor Instruction Set Details
BC0F
31 26 25 21 20
Branch on Coprocessor 0 False 16 15 0
BC0F
COP0 010000
6
BC0 01000
5
BC0F 00000
5
offset
16
MIPS I
Format: Description: A branch target address is computed from the sum of the address of the instruction in the delay slot and 16-bit offset, shifted left two bits and sign-extended. If coprocessor 0's condition signal, as sampled during the previous instruction, is false, then the program branches to the target address with a delay of one instruction. Restrictions: Because the coprocessor 0 condition is externally supplied, there is no way to synchronize the change/update of the condition and the execution of this instruction. Operation:
I: tgt_offset sign_extend (offset || 02) condition not CPCOND0
BC0F offset
I+1: if condition then PC PC + tgt_offset endif
Exceptions: Coprocessor Unusable exception
C-2
Appendix C COP0 System Control Coprocessor Instruction Set Details
BC0FL
31 26 25
Branch on Coprocessor 0 False Likely 21 20 16 15
BC0FL
0
COP0 010000
6
BC0 01000
5
BC0FL 00010
5
offset
16
MIPS II
Format: Description: A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of coprocessor 0's condition signal, as sampled during the previous instruction, is false, the program branches to the target address with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified. Restrictions: Because the coprocessor 0 condition is externally supplied, there is no way to synchronize the change/update of the condition and the execution of this instruction. Operation:
I: tgt_offset sign_extend (offset || 02) condition not CPCOND0
BC0FL offset
I+1: if condition then PC PC + tgt_offset endif
Exceptions: Coprocessor Unusable exception
C-3
Appendix C COP0 System Control Coprocessor Instruction Set Details
BC0T
31 26 25 21 20
Branch on Coprocessor 0 True 16 15 0
BC0T
COP0 010000
6
BC0 01000
5
BC0T 00001
5
offset
16
MIPS I
Format: Description: A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the coprocessor 0'z condition signal is true, then the program branches to the target address, with a delay of one instruction. Restrictions: Because the coprocessor 0 condition is externally supplied, there is no way to synchronize the change/update of the condition and the execution of this instruction. Operation:
I: tgt_offset sign_extend (offset || 02) condition not CPCOND0
BC0T offset
I+1: if condition then PC PC + tgt_offset endif
Exceptions: Coprocessor Unusable exception
C-4
Appendix C COP0 System Control Coprocessor Instruction Set Details
BC0TL
31 26 25
Branch on Coprocessor 0 True Likely 21 20 16 15
BC0TL
0
COP0 010000
6
BC0 01000
5
BC0TL 00011
5
offset
16
MIPS II
Format: Description: A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of coprocessor 0's condition signal, as sampled during the previous instruction, is true, the program branches to target address with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified. Restrictions: Because the coprocessor 0 condition is externally supplied, there is no way to synchronize the change/update of the condition and the execution of this instruction. Operation:
I: tgt_offset sign_extend (offset || 02) condition not CPCOND0
BC0TL offset
I+1: if condition then PC PC + tgt_offset else NullifyCurrentInstruction() endif
Exceptions: Coprocessor Unusable exception
C-5
Appendix C COP0 System Control Coprocessor Instruction Set Details
CACHE
31 26 25 21 20 16 15
Cache
CACHE
0
CACHE 101111
6
base
5
op (See table)
5
offset
16
R4000
Format: Description: The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address (VA). The VA is translated to a physical address (PA) through the memory management unit and its TLB, and the 5-bit OpCode (decode in the table below) specifies a cache operation for that address, together with the affected cache. Operation of this instruction on any combination not listed in the table below is undefined. The operation of this instruction on uncached and uncached accelerated addresses is also undefined unless it is index-type sub-operation.
Table C-1. CACHE Instruction Op Field Encoding Mnemonic IXIN IXLTG IXSTG IHIN IFL IXLDT IXSDT BXLBT BXSBT BFH BHINBT DXWBIN DXLTG DXSTG DXIN DHIN DHWBIN DXLDT DXSDT DHWOIN OpCode 00111 00000 00100 01011 01110 00001 00101 00010 00110 01100 01010 10100 10000 10010 10110 11010 11000 10001 10011 11100 CACHE Instruction INDEX INVALIDATE INDEX LOAD TAG INDEX STORE TAG HIT INVALIDATE FILL INDEX LOAD DATA INDEX STORE DATA INDEX LOAD BTAC INDEX STORE BTAC BTAC FLUSH HIT INVALIDATE BTAC INDEX WRITE BACK INVALIDATE INDEX LOAD TAG INDEX STORE TAG INDEX INVALIDATE HIT INVALIDATE HIT WRITEBACK INVALIDATE INDEX LOAD DATA INDEX STORE DATA HIT WRITEBACK W/O INVALIDATE Target Instruction Cache Instruction Cache Instruction Cache Instruction Cache Instruction Cache Instruction Cache Instruction Cache BTAC BTAC BTAC BTAC Data Cache Data Cache Data Cache Data Cache Data Cache Data Cache Data Cache Data Cache Data Cache
CACHE op, offset (base)
C-6
Appendix C COP0 System Control Coprocessor Instruction Set Details Operation:
vAddr(offset15)16 || offset15..0 + GPR[base] 31..0 (pAddr, uncached) AddressTranslation (vAddr, DATA) CacheOp (op, vAddr, pAddr)
Exceptions:
Coprocessor Unusable exception TLB Refill TLB Invalid Address Error
C.1.1
Notes on the CACHE Instruction Sub-operations
Cache Virtual Address The CACHE instruction uses the following portions of the Virtual Address (VA) computed by adding the offset to the base to specify a cache block and way: * * * VA[13:6] defines a 64-byte line in the data cache array VA[13:6] defines a 64-byte line in the instruction cache array In both cases, VA[0] defines the way needed by Index sub-operations
When accessing data in the caches, VA[13:2] is used to read or write a specific data word in the data cache and VA[13:2] is use to read or write a specific instruction in the instruction cache. Cache Physical Address The CACHE instruction computes the Physical Address (PA) to access memory for cache Hit Invalidate (I) and Fill (I) sub-operations in the following manner: * VA[31:6] is computed from the CACHE instruction by adding the offset to the base and then the result is translated to produce PA[31:6]
The CACHE instruction computes the Physical Address (PA) to access memory for cache Hit Invalidate (D), Hit Writeback Invalidate (D), Hit Writeback Without Invalidate (D) sub-operations in the following manner: * VA[31:6] is computed from the CACHE instruction by adding the offset to the base and then the result is translated to produce PA[31:6]
BTAC Virtual Address The CACHE instruction uses the following portions of the Virtual Address (VA) computed by adding the offset to the base to check if there is an entry that matches the VA: * VA[31:3] defines an entry in the BTAC
BTAC Index Bits Since the BTAC is has 64 entries the VA[5:0] computed from the CACHE instruction by adding the offset to the base is used to index the BTAC. COP0 Not Usable If COP0 is not usable (if not in Kernel mode, Status.CU0 must be set for COP0 to be usable), a Coprocessor unusable exception is taken.
C-7
Appendix C COP0 System Control Coprocessor Instruction Set Details TLB Exceptions on Cache Operations TLB Refill and TLB Invalid exceptions can occur only for the following sub-operations: 1. Hit Invalidate (I) 2. Fill (I) 3. Hit Invalidate (D) 4. Hit Writeback Invalidate (D) 5. Hit Writeback without Invalidate (D) The TLB Modified exception is never generated. Hit Sub-operation Accesses A Hit sub-operation accesses the specified cache as a normal data reference, and performs the specified operation if the cache line contains valid data at the specified physical address (a hit). The operation is undefined if a CACHE sub-operation hit occurs in both ways of the cache. Breakpoint Exception Breakpoint exceptions can not be generated by any of the CACHE sub-operations (note that an Instruction Address Breakpoint can still be done on the CACHE instruction itself). Address Error Exception None of the CACHE sub-operations will generate an Address Error exception due to misalignment of the VA created by the CACHE instruction as described above. The following CACHE sub-operations can generate privilege-type Address Error exceptions: 1. Hit Invalidate (I) 2. Fill (I) 3. Hit Invalidate (D) 4. Hit Writeback Invalidate (D) 5. Hit Writeback without Invalidate (D)
C-8
Appendix C COP0 System Control Coprocessor Instruction Set Details
C.1.2
Sub-Operation Descriptions
Note on Cache Enable Status All Instruction cache related suboperations perform their function regardless of the value of the ICE bit of the Config register. (i.e., regardless of whether the Instruction cache is enabled or not.) All data cache related suboperations perform their function regardless of the value of the DCE bit of the Config register. (i.e., regardless of whether the data cache is enabled or not.) All BTAC-related suboperations perform their function regardless of the value of the BPE bit of the Config register.
Op = 00111 Index Invalidate (I)
Index Invalidate (I) sets a line in the instruction cache to Invalid. VA[13:6] defines the index of the line and VA[0] defines the way to be invalidated. The LRF bit does not change.
Op = 00000 Index Load Tag (I)
Index Load Tag (I) reads the instruction cache tag array fields into the COP0 TagLO register. VA[13:6] defines the index and VA[0] defines the way of the tag to be read. The following mapping defines the sub-operation: * * * TagLO[4] = LRF bit TagLO[5] = VALID bit TagLO[31:12] = Tag[19:0]
All other TagLO bits are undefined.
Op = 00100 Index Store Tag (I)
Index Store Tag (I) stores the COP0 TagLO register into the instruction cache tag array. VA[13:6] defines the index and VA[0] defines the way of the tag to be read. The following mapping defines the sub-operation: * * * LRF bit = TagLO[4] VALID bit = TagLO[5] Tag[19:0] = TagLO[31:12]
Note that it is perfectly feasible to invalidate the cache line using this sub-operation.
Op = 01011 Hit Invalidate (I)
Hit Invalidate (I) invalidates a line in the instruction cache which matches the PA[31:6] computed from the CACHE instruction. Both way tags at VA[13:6] are read from the instruction cache. If the Valid bit of one of the entries is a 1 and the PA of the CACHE instruction matches the Tag from that entry of the instruction cache tag array, the Valid bit of the entry is changed to a 0 (Invalid). The LRF bit does not change. This sub-operation also invalidates BTAC entries which match VA[31:6].
C-9
Appendix C COP0 System Control Coprocessor Instruction Set Details
Op = 01110 Fill (I)
Fill (I) brings in a cache line from memory and stores it in the instruction cache. The following sequence is followed: 1. The PA computed from the CACHE instruction is used to fetch the cache line from memory. 2. The line is loaded into the cache line addressed by VA[13:6] and the way of cache is defined by the rules of the LRF bits. 3. The corresponding instruction cache tag is loaded with the PFN and the entry is validated.
Op = 00001 Index Load Data (I)
Index Load Data (I) reads a single instruction from the instruction cache data array and stores it into the COP0 TagLO and TagHI registers. VA[13:2] defines the index and VA[0] defines the way of the instruction cache to be read. The following mapping defines the suboperation: * * * TagLO[31:0] = 32-bit instruction TagHI[3:0] = SteeringBits[3:0] TagHI[5:4] = BHT[1:0]
All other TagHI bits are undefined.
Op = 00101 Index Store Data (I)
Index Store Data (I) stores the COP0 TagLO and TagHI registers into the instruction cache data array. VA[13:2] defines the index and VA[0] defines the way of the instruction cache to be written. The following mapping defines the sub-operation: * * * 32-bit instruction SteeringBits[3:0] BHT[1:0] = TagLO[31:0] = TagHI[3:0] = TagHI[5:4]
The BHT[1:0] bits are associated with the instruction pair at VA[13:3]. This sub-operation invalidates all BTAC entries.
Op = 00010 Index Load BTAC (B)
Index Load BTAC (B) reads a single BTAC entry and stores it into the COP0 TagLO registers. VA[5:0] defines the index of the BTAC entry to be read. The following mapping defines the sub-operation: * * * TagLO[0] = Valid Bit TagLO[31:3] = FetchAddress[28:0] TagHI[31:2] = TargetAddress[29:0]
All other TagLO and TagHI bits are undefined.
C-10
Appendix C COP0 System Control Coprocessor Instruction Set Details
Op = 00110 Index Store BTAC (B)
Index Store BTAC (B) stores the COP0 TagLO and TagHI registers into a single BTAC entry. VA[5:0] defines the index of the BTAC entry to be written. The following mapping defines the sub-operation: * * *
Op = 01100
Valid Bit = TagLO[0] FetchAddress[28:0] = TagLO[31:3] TargetAddress[29:0] = TagHI[31:2]
BTAC Flush (B)
This sub-operation invalidates the complete BTAC by writing a 0 into the valid bits of all the entries of the BTAC.
Op = 01010 Hit Invalidate BTAC (B)
Hit Invalidate BTAC (B) invalidates an entry in the BTAC which matches the VA[31:3] computed from the CACHE instruction. If the VA[31:3] matches an entry in the BTAC and its Valid bit is equal to 1 then the Valid bit is changed to a 0. The result is undefined if there are plural of entries that matches the VA.
Op = 10100 Index Writeback Invalidate (D)
Index Writeback Invalidate (D) sub-operation sets a cache line in the data cache to Invalid and writes back any dirty data to the CPU bus. VA[13:6] defines the index and VA[0] defines the way of the data cache line to be invalidated. The invalidation takes place by writing a 0 to the Valid bit. The LRF bit does not change. The PA where the cache line will be written to is calculated by appending VA[11:6] to the 20-bit PFN field from the data cache tag to form PA[31:6]. This address represents a cache line address.
Op = 10000 Index Load Tag (D)
Index Load Tag (D) reads the data cache tag array fields into the COP0 TagLO register. VA[13:6] defines the index and VA[0] defines the way of the tag to be read. The following mapping defines the sub-operation: * * * * * TagLO[3] = Lock bit TagLO[4] = LRF bit TagLO[5] = Valid bit TagLO[6] = Dirty bit TagLO[31:12] = Tag[31:12]
All other TagLO bits are undefined.
Op = 10010 Index Store Tag (D)
Index Store Tag (D) stores the COP0 TagLO register into the data cache tag array. VA[13:6] defines the index and VA[0] defines the way of the tag to be written. The following mapping defines the sub-operation: * * * * * Lock bit = TagLO[3] LRF bit = TagLO[4] Valid bit = TagLO[5] Dirty bit = TagLO[6] & TagLO[5] Tag[19:0] = TagLO[31:12] C-11
Appendix C COP0 System Control Coprocessor Instruction Set Details
Op = 10110 Index Invalidate (D)
Index Invalidate (D) sets a line in the data cache to Invalid. VA[13:6] defines the index of the line and VA[0] defines the way to be invalidated. The Lock bit, Dirty bit, and Valid bit are changed to zero. The LRF bit doesn't change.
Op = 11010 Hit Invalidate (D)
Hit Invalidate (D) invalidates an entry in the data cache which matches the PA computed from the CACHE instruction. Both way tags at VA[13:6] are read from the data cache. If the Valid bit of the entry is one and the PA of the CACHE instruction matches the Tag from the data cache tag array, the Valid bit of the entry is changed to zero (Invalid). The Lock bit and Dirty bit are also changed to zero. The LRF bit does not change.
Op = 11000 Hit Writeback Invalidate (D)
Hit Writeback Invalidate (D) sub-operation invalidates an entry in the data cache which matches the PA computed from the CACHE instruction. Additionally it writes back any dirty data to the CPU bus. Both way tags at VA[13:6] are read from the data cache. The Lock bit, Dirty bit, and Valid bit are changed to zero. The LRF bits are not modified. If the PA computed from the CACHE instruction matches the tag from the data cache tag array and the Valid bit is 1 then the Valid bit is changed to 0. Further more if the Dirty bit is 1 then the cache line is written to the physical address calculated by appending VA[11:6] to the 20-bit PFN field from the data cache tag to form PA[31:6]. This address represents a cache line physical address.
Op = 10001 Index Load Data (D)
Index Load Data (D) reads a single word from the data cache data array and stores it into the COP0 TagLO register. VA[13:2] defines the index and VA[0] defines the way of the data cache to be read. The following mapping defines the sub-operation: *
Op = 10011
TagLO[31:0] = 32-bit data
Index Store Data (D)
Index Store Data (D) stores the COP0 TagLO register into the data cache data array. VA[13:2] defines the index and VA[0] defines the way of the data cache to be written. The following mapping defines the sub-operation: *
Op = 11100
32-bit data = TagLO[31:0]
Hit Writeback Without Invalidate (D)
Hit Writeback Without Invalidate (D) sub-operation writes back any dirty data to the CPU bus. Both way tags at VA[13:6] are read from the data cache. The Dirty bit is changed to zero. The LRF bits are not modified. If the PA computed from the CACHE instruction matches the tag from the data cache tag array and the Valid and Dirty bits are 1 then the cache line is written to the physical address calculated by appending VA[11:6] to the 20-bit PFN field from the data cache tag to form PA[31:6]. This address represents a cache line physical address.
C-12
Appendix C COP0 System Control Coprocessor Instruction Set Details Programming Notes: For all CACHE sub-operations which operate on the instruction cache the following programming restrictions have to be followed: 1. A sequence of CACHE instructions has to be directly preceded and followed by a SYNC.P instruction. 2. Each individual FILL sub-operation has to be followed by a SYNC.L instruction. For all CACHE sub-operations which operate on the data cache the following programming restrictions have to be followed: 1. A sequence of CACHE instructions have to be directly preceded and followed by a SYNC.L instruction. 2. Each of the three WRITEBACK sub-operations have to be individually followed by a SYNC.L instruction. For all CACHE sub-operations which operate on the BTAC the following programming restrictions have to be followed: 1. A sequence of CACHE instructions have to be directly preceded and followed by a SYNC.P instruction.
C.1.3
Updates of Data Tag Status Bits
The following table summarizes the updates of Data Tag status bits for various Cache suboperations. The values in the table for Hit Writeback Invalidate, Hit Writeback Without Invalidate, and Hit Invalidate only apply if there is a hit in the data cache. If there is no hit, the status bits are unchanged.
Table C-2. Data Tag Status Bit Modifications Cache Instruction
Index Load Data Index Store Data Index Load Tag Index Store Tag Index Writeback Invalidate Index Invalidate Hit Invalidate Hit Writeback Invalidate Hit Writeback Without Invalidate
LRF Bit
unchanged unchanged unchanged loaded unchanged unchanged unchanged unchanged unchanged
Lock Bit
unchanged unchanged unchanged loaded cleared cleared cleared cleared unchanged
Dirty Bit
unchanged unchanged unchanged loaded cleared cleared cleared cleared cleared
Valid Bit
unchanged unchanged unchanged loaded cleared cleared cleared cleared unchanged
C-13
Appendix C COP0 System Control Coprocessor Instruction Set Details
DI
31 26 25 21 20
Disable Interrupt 65 0
DI
DI 111001
6
COP0 010000
6
C0 10000
5
0 000 0000 0000 0000
15
C790
Format: Description: DI instruction clears the EIE bit in the Status register and disable all interrupts (except NMI and SIO). When the EIE bit is cleared, all interrupts are disabled regardless of the value of IE bit in the Status register. When the EDI bit in the Status register is set, the DI instruction operates in User, Supervisor, and Kernel modes independent of whether COP0 coprocessor usable bit (Status.CU[0]) is set or not. When this bit is cleared EI and DI work as NOPs in User and Supervisor modes independent of whether COP0 coprocessor usable bit (Status.CU[0]) is set or not, and executes properly in Kernel mode. Operation:
If (Status.EDI = 1) || (Status.EXL = 1) || (Status.ERL = 1) || (Status.KSU = 002) then Status.EIE 0 endif
DI
Exceptions: None
C-14
Appendix C COP0 System Control Coprocessor Instruction Set Details
EI
31 26 25 21 20
Enable Interrupt 65 0
EI
EI 111000
6
COP0 010000
6
C0 10000
5
0 000 0000 0000 0000
15
C790
Format: Description: EI instruction sets the EIE bit in the Status register. When the EIE bit is set, all interrupts are enabled if the IE bit in the Status register is 1, EXL bit is 0, and ERL bit is 0. When the EDI bit in the Status register is set, the EI instruction operates in User, Supervisor, and Kernel modes independent of whether COP0 coprocessor usable bit (Status.CU[0]) is set or not. When this bit is cleared EI and DI work as NOPs in User and Supervisor modes independent of whether COP0 coprocessor usable bit (Status.CU[0]) is set or not, and executes properly in Kernel mode. Operation:
If (Status.EDI = 1) || (Status.EXL = 1) || (Status.ERL = 1) || (Status.KSU = 002) then Status.EIE 1 endif
EI
Exceptions:
None
C-15
Appendix C COP0 System Control Coprocessor Instruction Set Details
ERET
31 26 25 21 20
Exception Return 65 0
ERET
ERET 011000
6
COP0 010000
6
C0 10000
5
0 000 0000 0000 0000
15
R4000
Format: Description: ERET is the instruction for returning from an interrupt, exception, or error trap. Unlike a branch or jump instruction, ERET does not execute the next instruction. ERET must not itself be placed in a branch delay slot. If the processor is servicing a Level 2 exception, then load the PC from the ErrorEPC and clear the ERL bit of the Status register (bit 2 in Status register). Otherwise (ERL = 0), load the PC from the EPC, and clear the EXL bit of the Status register (bit 1 in Status register). Operation:
if Status.ERL = 1 then PC ErrorEPC Status.ERL 0 else PC EPC Status.EXL 0 endif
ERET
Exceptions: Coprocessor Unusable exception Implementation Note: ERET flushes the execution pipelines of the CPU before fetching the instruction from the target. Any pending loads, stores, ongoing multiplies, divides, multiply-accumulates and COP1 instructions are not flushed. Programming Notes: Any Reserved Instruction must not be placed in a branch delay slot just after ERET instruction. Please pay careful attention if any instruction is placed in the branch delay slot, because the instruction in the branch delay slot may be executed incompletely before flushing. It is commended that NOP is placed in the branch delay slot.
C-16
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFBPC
31 26 25
Move from Breakpoint Control Register 21 20 16 15 11 10 32
MFBPC
0
MFBPC
COP0 010000
6
MF0 00000
5
rt
5
Debug 11000
5
0 0000 0000
8
000
3
C790
Format: Description: The contents of the Breakpoint Control register of the COP0 are loaded into general register rt. Operation:
data CPR[0, Breakpoint Control] GPR[rt] (data31)32 || data31..0
MFBPC rt
Exceptions: Coprocessor Unusable exception
C-17
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFC0
31 26 25
Move from System Control Coprocessor 21 20 16 15 11 10 0
MFC0
COP0 010000
6
MF0 00000
5
rt
5
rd
5
0 000 0000 0000
11
R4000
Format: Description: The contents of coprocessor register rd of the COP0 are loaded into general register rt. Operation:
data CPR[0, rd] GPR[rt] (data31)32 || data31..0
MFC0 rt, rd
Exceptions: Coprocessor Unusable exception
C-18
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFDAB
31 26 25
Move from Data Address Breakpoint register 21 20 16 15 11 10 32
MFDAB
0
MFDAB
COP0 010000
6
MF0 00000
5
rt
5
Debug 11000
5
0 0000 0000
8
100
3
C790
Format: Description: The contents of Data Address Breakpoint register of the COP0 are loaded into general register rt. Operation:
data CPR[0, Data Address Breakpoint] GPR[rt] (data31)32 || data31..0
MFDAB rt
Exceptions:
Coprocessor Unusable exception
C-19
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFDABM
31 26 25
Move from Data Address Breakpoint Mask Register 21 20 16 15 11 10
MFDABM
32 0
MFDABM
COP0 010000
6
MF0 00000
5
rt
5
Debug 11000
5
0 0000 0000
8
101
3
C790
Format: Description: The contents of Data Address Breakpoint Mask register of the COP0 are loaded into general register rt. Operation:
data CPR[0, Data Address Breakpoint Mask] GPR[rt] (data31)32 || data31..0
MFDABM rt
Exceptions: Coprocessor Unusable exception
C-20
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFDVB
31 26 25
Move from Data value Breakpoint Register 21 20 16 15 11 10 32
MFDVB
0
MFDVB
COP0 010000
6
MF0 00000
5
rt
5
Debug 11000
5
0 0000 0000
8
110
3
C790
Format: Description: The contents of Data Value Breakpoint register of the COP0 are loaded into general register rt. Operation:
data CPR[0, Data Value Breakpoint] GPR[rt] (data31)32 || data31..0
MFDVB rt
Exceptions: Coprocessor Unusable exception
C-21
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFDVBM
31 26 25
Move from Data Value Breakpoint Mask Register 21 20 16 15 11 10
MFDVBM
32 0
MFDVBM
COP0 010000
6
MF0 00000
5
rt
5
Debug 11000
5
0 0000 0000
8
111
3
C790
Format: Description: The contents of Data Value Breakpoint Mask register of the COP0 are loaded into general register rt. Operation:
data CPR[0, Data Value Breakpoint Mask] GPR[rt] (data31)32 || data31..0
MFDVBM rt
Exceptions: Coprocessor Unusable exception
C-22
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFIAB
31 26 25
Move from Instruction Address Breakpoint Register 21 20 16 15 11 10
MFIAB
32 0
MFIAB
COP0 010000
6
MF0 00000
5
rt
5
Debug 11000
5
0 0000 0000
8
010
3
C790
Format: MFIAB rt
Description: The contents of Instruction Address Breakpoint register of the COP0 are loaded into general register rt. Operation:
data CPR[0, Instruction Address Breakpoint] GPR[rt] (data31)32 || data31..0
Exceptions: Coprocessor Unusable exception
C-23
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFIABM
31 26 25
Move from Instruction Address Breakpoint Mask Register 21 20 16 15 11 10
MFIABM
32 0
MFIABM
COP0 010000
6
MF0 00000
5
rt
5
Debug 11000
5
0 0000 0000
8
011
3
C790
Format: Description: The contents of Instruction Address Breakpoint Mask register of the COP0 are loaded into general register rt. Operation:
data CPR[0, Instruction Address Breakpoint Mask] GPR[rt] (data31)32 || data31..0
MFIABM rt
Exceptions: Coprocessor Unusable exception
C-24
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFPC
31 26 25 21 20
Move from Performance Counter 16 15 11 10 65 10
MFPC
reg
5
COP0 010000
6
MF0 00000
5
rt
5
Perf 11001
5
0 00000
5
1
1
C790
Format: MFPC rt, reg
Description: The contents of Performance Counter register of the COP0 are loaded into general register rt. The reg OpCode bit indicates the number of Performance Counters. Only register 0 and 1 are valid in the C790 implementation. Operation:
data CPR[0, Performance Counter (reg)] GPR[rt] (data31)32 || data31..0
Exceptions: Coprocessor Unusable exception
C-25
Appendix C COP0 System Control Coprocessor Instruction Set Details
MFPS
31 26 25
Move from Performance Event Specifier 21 20 16 15 11 10 65 10
MFPS
reg
5
COP0 010000
6
MF0 00000
5
rt
5
Perf 11001
5
0 00000
5
0
1
C790
Format: Description: The contents of Performance Control register of the COP0 are loaded into general register rt. The reg OpCode bit indicates the number of Performance Counter Control registers. Only register 0 is valid in the C790 implementation. Operation:
data CPR[0, Performance Control (reg)] GPR[rt] (data31)32 || data31..0
MFPS rt, reg
Exceptions: Coprocessor Unusable exception
C-26
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTBPC
31 26 25
Move to Breakpoint Control Register 21 20 16 15 11 10 32
MTBPC
0
MTBPC
COP0 010000
6
MT0 00100
5
rt
5
Debug 11000
5
0 0000 0000
8
000
3
C790
Format: Description: The contents of general register rt are loaded into Breakpoint Control register of COP0. Operation:
data GPR[rt] CPR[0, Breakpoint Control] data
MTBPC rt
Programming Notes: All MTBPC instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor Unusable exception
C-27
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTC0
31 26 25
Move to System Control Coprocessor 21 20 16 15 11 10 0
MTC0
COP0 010000
6
MT0 00100
5
rt
5
rd
5
0 000 0000 0000
11
R4000
Format: Description: The contents of general register rt are loaded into coprocessor register rd of COP0. Operation:
data GPR[rt] CPR[0, rd] data
MTC0 rt, rd
Programming Notes: 1. All MTC0 instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. There is one exception to this rule: a) An MTC0 instruction which loads the EntryHi COP0 register can be followed by a TLBWI or a TLBWR instruction without having an intervening SYNC.P instruction. This special case is handled by a hardware interlock.
2.
It is required that the MTC0 instruction to EntryHi register MUST be executed either from unmapped space or from global mapped space (mapped space with a TLB entry which has the G bit set). Furthermore, the BTAC is flushed whenever the EntryHi register is updated. Modifying CONFIG.K0 via a MTC0 instruction should not occur from kseg0 space. A SYNC.L instruction is needed before executing a MTC0 instruction which modifies CONFIG.NBE or CONFIG.DCE. Updating the performance counter registers via a MTC0 instruction while the performance counters are enabled will result in undefined counter values.
3. 4. 5.
Exceptions: Coprocessor Unusable exception
C-28
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTDAB
31 26 25
Move to Data Address Breakpoint Register 21 20 16 15 11 10 32
MTDAB
0
MTDAB
COP0 010000
6
MT0 00100
5
rt
5
Debug 11000
5
0 0000 0000
8
100
3
C790
Format: Description: The contents of general register rt are loaded into Data Address Breakpoint register of COP0. Operation:
data GPR[rt] CPR[0, Data Address Breakpoint] data
MTDAB rt
Programming Notes: All MTDAB instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor Unusable exception
C-29
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTDABM
31 26 25
Move to Data Address Breakpoint Mask Register 21 20 16 15 11 10
MTDABM
32 0
MTDABM
COP0 010000
6
MT0 00100
5
rt
5
Debug 11000
5
0 0000 0000
8
101
3
C790
Format Description: The contents of general register rt are loaded into Data Address Breakpoint Mask register of COP0. Operation:
data GPR[rt] CPR[0, Data Address Breakpoint Mask] data
MTDABM rt
Programming Notes: All MTDABM instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor Unusable exception
C-30
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTDVB
31 26 25
Move to Data Value Breakpoint Register 21 20 16 15 11 10 32
MTDVB
0
MTDVB
COP0 010000
6
MT0 00100
5
rt
5
Debug 11000
5
0 0000 0000
8
110
3
C790
Format: MTDVB rt
Description: The contents of general register rt are loaded into Data Value Breakpoint register of COP0. Operation:
data GPR[rt] CPR[0, Data Value Breakpoint] data
Programming Notes: All MTDVB instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor Unusable exception
C-31
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTDVBM
31 26 25
Move to Data Value Breakpoint Mask Register 21 20 16 15 11 10
MTDVBM
32 0
MTDVBM
COP0 010000
6
MT0 00100
5
rt
5
Debug 11000
5
0 0000 0000
8
111
3
C790
Format: MTDVBM rt
Description: The contents of general register rt are loaded into Data Value Breakpoint Mask register of COP0. Operation:
data GPR[rt] CPR[0, Data Value Breakpoint Mask] data
Programming Notes: All MTDVBM instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor Unusable exception
C-32
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTIAB
31 26 25
Move to Instruction Address Breakpoint Register 21 20 16 15 11 10
MTIAB
32 0
MTIAB
COP0 010000
6
MT0 00100
5
rt
5
Debug 11000
5
0 0000 0000
8
010
3
C790
Format: Description: The contents of general register rt are loaded into Instruction Address Breakpoint register of COP0. Operation:
data GPR[rt] CPR[0, Instruction Address Breakpoint] data
MTIAB rt
Programming Notes: All MTIAB instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor Unusable exception
C-33
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTIABM
31 26 25
Move to Instruction Address Mask Breakpoint Register 21 20 16 15 11 10
MTIABM
32 0
MTIABM
COP0 010000
6
MT0 00100
5
rt
5
Debug 11000
5
0 0000 0000
8
011
3
C790
Format: Description: The contents of general register rt are loaded into Instruction Address Mask Breakpoint register of COP0. Operation:
data GPR[rt] CPR[0, Instruction Address Mask Breakpoint] data
MTIABM rt
Programming Notes: All MTIABM instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor Unusable exception
C-34
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTPC
31 26 25 21 20
Move to Performance Counter 16 15 11 10 65 10
MTPC
reg
5
COP0 010000
6
MT0 00100
5
rt
5
Perf 11001
5
0 00000
5
1
1
C790
Format: Description: The contents of general register rt are loaded into Performance Counter register. The reg OpCode bit indicates the number of Performance Counters. Only register 0 and 1 are valid in the C790 implementation. Operation:
data GPR[rt] CPR[0, Performance Counter (reg)] data
MTPC rt, reg
Programming Notes: All MTPC instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Updating the performance counters via a MTPC instruction while the performance counters are enabled will result in undefined counter values. Exceptions: Coprocessor unusable exception
C-35
Appendix C COP0 System Control Coprocessor Instruction Set Details
MTPS
31 26 25
Move to Performance Event Specifier 21 20 16 15 11 10 65 10
MTPS
reg
5
COP0 010000
6
MT0 00100
5
rt
5
Perf 11001
5
0 00000
5
0
1
C790
Format: MTPS rt, reg
Description: The contents of general register rt are loaded into Performance Control register. The reg OpCode bit indicates the number of Performance Control registers. Only register 0 is valid in the C790 implementation. Operation:
data GPR[rt] CPR[0, Performance Control (reg)] data
Programming Notes: All MTPS instructions MUST be followed by a SYNC.P instruction as a barrier to guarantee COP0 register update. Exceptions: Coprocessor unusable exception
C-36
Appendix C COP0 System Control Coprocessor Instruction Set Details
TLBP
31 26 25 21 20
Probe TLB for Matching Entry 65 0
TLBP
TLBP 001000
6
COP0 010000
6
C0 10000
5
0 000 0000 0000 0000
15
R4000
Format: Description: The Index register is loaded with the address of the TLB entry whose contents match the contents of the EntryHi register. If no TLB entry matches, the high-order bit of the Index register is set to 1. Note that the virtual address in the EntryHi register is masked with the corresponding mask field of the TLB entry prior to the comparison. The architecture does not specify the operation of memory references associated with the instruction immediately after a TLBP instruction, nor is the operation specified if more than one TLB entry matches. Operation:
Index 1 || 025 || undefined6 for i in 0..TLBEnteries-1 if (TLB[i]95..77 = ( (not TLB[i]127..109) and EntryHi31..13) ) and (TLB[i]76 or (TLB[i]71..64 = EntryHi7..0)) then Index 026 || i5..0 endif endfor
TLBP
Programming Notes: The TLBP instruction MUST be immediately followed by SYNC.P or ERET instruction Exceptions: Coprocessor Unusable exception
C-37
Appendix C COP0 System Control Coprocessor Instruction Set Details
TLBR
31 26 25 21 20
Read Indexed TLB Entry 65 0
TLBR
TLBR 000001
6
COP0 010000
6
C0 10000
5
0 000 0000 0000 0000
15
R4000
Format: TLBR
Description: The EntryHi, EntryLo, and PageMask registers are loaded with the contents of the TLB entry pointed at by the contents of the TLB Index register. The G bit (which controls ASID matching) read from the TLB is written into both of the EntryLo0 and EntryLo1 registers. Depending the value in PageMask register used for a TLB write instruction, the value read out from TLB may not retrieve what was originally written. See Description for TLBWI/TLBWR instruction. Operation:
PageMask TLB[Index5..0]127..96 EntryHi (TLB[Index5..0]95..77 || 05 || TLB[Index5..0]71..64 ) and (not TLB[Index5..0]127..96) EntryLo0 TLB[Index5..0]63..33 || TLB[Index5..0]76 EntryLo1 TLB[Index5..0]31..1 || TLB[Index5..0]76
Programming Notes: The TLBR instruction MUST be executed from either unmapped space or global mapped space (mapped space with a TLB entry which has the G bit set). The TLBR instruction MUST be immediately followed by SYNC.P or ERET instruction. Exceptions: Coprocessor Unusable exception
C-38
Appendix C COP0 System Control Coprocessor Instruction Set Details
TLBWI
31 26 25 21 20
Write Index TLB Entry 65
TLBWI
0
COP0 010000
6
C0 10000
5
0 000 0000 0000 0000
15
TLBWI 000010
6
R4000
Format: TLBWI
Description: The TLB entry pointed at by the contents of the TLB Index register is loaded with the contents of the PageMask, EntryHi, EntryLo0 and EntryLo1 registers. The G bit of the TLB is written with the logical AND of the G bits in the EntryLo0 and EntryLo1 registers. The virtual address in the EntryHi register is modified by the Mask field of the PageMask register before being written into the TLB. The operation is invalid (and the results are unspecified) if contents of the TLB Index register are greater than the number of TLB entries in the processor. In the C790 processor, a TLB write instruction is used to write the whole page frame number from the EntryLo registers to the TLB entry. Depending on the page size specified in the corresponding PageMask register, the lower bits of PFN may not be used for address translation and lower bits of VPN2 in EntryHi register which is masked by the content of PageMask register are forced to zeros during a TLB write. This does not affect TLB address translation, however, a TLB read may not retrieve what was originally written. Operation:
TLB[Index5..0] PageMask || ((EntryHi31..13 || (EntryLo00 and EntryLo10) || EntryHi11..0 ) and (not PageMask )) || EntryLo031..1 || 0 || EntryLo131..1 || 0
Programming Notes: The TLBWI instruction MUST be executed from either unmapped space or global mapped space (mapped space with a TLB entry which has the G bit set). The TLBWI instruction MUST be followed by a ERET or a SYNC.P instruction to insure TLB update. Exceptions: Coprocessor Unusable exception
C-39
Appendix C COP0 System Control Coprocessor Instruction Set Details
TLBWR
31 26 25 21 20
Write Random TLB Entry 65
TLBWR
0
COP0 010000
6
C0 10000
5
0 000 0000 0000 0000
15
TLBWR 000110
6
R4000
Format: Description: The TLB entry pointed at by the contents of the TLB Random register is loaded with the contents of the PageMask, EntryHi, EntryLo0 and EntryLo1 registers. The G bit of the TLB is written with the logical AND of the G bits in the EntryLo0 and EntryLo1 registers. The virtual address in the EntryHi register is modified by the Mask field of the PageMask register before being written into the TLB. In the C790 processor, a TLB write instruction is used to write the whole page frame number from the EntryLo registers to the TLB entry. Depending on the page size specified in the corresponding PageMask register, the lower bits of PFN may not be used for address translation and lower bits of VPN2 in EntryHi register which is masked by the content of PageMask register are forced to zeros during a TLB write. This does not affect TLB address translation, however, a TLB read may not retrieve what was originally written. Operation:
TLB[Random5..0] PageMask || ((EntryHi31..13 || (EntryLo00 and EntryLo10) || EntryHi11..0 ) and (not PageMask )) || EntryLo031..1 || 0 || EntryLo131..1 || 0
TLBWR
Programming Notes: The TLBWR instruction MUST be executed from either unmapped space or global mapped space (mapped space with a TLB entry which has the G bit set). The TLBWR instruction MUST be followed by a ERET or a SYNC.P instruction to insure TLB update. Exceptions: Coprocessor Unusable exception
C-40
Appendix C COP0 System Control Coprocessor Instruction Set Details
C.2 COP0 Instruction Encoding
31 26 0 OpCode OpCode bits 31..29 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 bits 28..26 0 000 ADDI COP0 DADDI LB SB 1 001 ADDIU COP1 DADDIU LH SH LWC1 SWC1 Instructions encoded by OpCode field (COP0, CACHE) 2 010 J SLTI * LDL LWL SWL 3 011 JAL SLTIU * LDR LW SW PREF * 4 100 BEQ ANDI BEQL MMI LBU SDL 5 101 BNE ORI BNEL * LHU SDR LDC1 SDC1 6 110 BLEZ XORI BLEZL LQ LWR SWR 7 111 BGTZ LUI BGTZL SQ LWU CACHE LD SD
SPECIAL REGIMM
31
26 25 rs
21
0
OpCode = COP0 rs bits 25..24 0 00 1 01 2 10 3 11 bits 23..21 0 000 MF0 BC0 C0 * 1 001 * * * *
Instructions encoded by rs field when OpCode field = COP0 2 010 * * * * 3 011 * * * * 4 100 MT0 * * * 5 101 * * * * 6 110 * * * * 7 111 * * * *
31
26 25
21 20
16 15
11 10
32
0
OpCode = rs = COP0 MF0 or MT0 function rs field MF0 MT0 bits 2..0 0 000 MFBPC MTBPC 1 001
rd = Debug*
function
Instructions encoded by function field when OpCode field = COP0 & rd field = Debug 2 010 MFIAB MTIAB 3 011 MFIABM MTIABM 4 100 MFDAB MTDAB 5 101 MFDABM MTDABM 6 110 MFDVB MTDVB 7 111 MFDVBM MTDVBM
31
26 25
21 20
16 15
11 10
*
1
0 function
OpCode = rs = COP0 MF0 or MT0 function rs field MF0 MT0 bits 0 0 MFPS MTPS 1 MFPC MTPC
rd = Perf
Instructions encoded by function field when OpCode field = COP0 & rd field = Perf
*
Debug and Perf are the CP0 register names. Debug = 11000 (24), Perf = 11001 (25)
C-41
Appendix C COP0 System Control Coprocessor Instruction Set Details
31
26 25
21 20 rt
16
0
OpCode = rs =BC0 COP0 rt bits 20..19 0 00 1 01 2 10 3 11 bits 18..16 0 000 BC0F * * * 1 001 BC0T * * *
Instructions encoded by rt field when OpCode field = COP0 & rs field = BC0 2 010 BC0FL * * * 3 011 BC0TL * * * 4 100 * * * * 5 101 * * * * 6 110 * * * * 7 111 * * * *
31
26 25 rs = C0
21
5
0
OpCode = COP0
function
function bits 5..3 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111
bits 2..0 0 000 TLBP ERET EI 1 001 TLBR DI
Instructions encoded by function field when OpCode field = COP0 & rs field = C0 2 010 TLBWI 3 011 4 100 5 101 6 110 TLBWR 7 111

This OpCode is reserved for future use. An attempt to execute it causes a Reserved Instruction exception. This OpCode is reserved for future use. An attempt to execute it produces an undefined result. The result may be a Reserved Instruction exception but this is not guaranteed. This OpCode indicates an instruction class. The instruction word must be further decoded by examining additional tables that show the values for another instruction field. This OpCode is reserved for one of the following instructions which are currently not supported: DMULT, DMULTU, DDIV, DDIVU, LL, LLD, SC, SCD, LWC2, SWC2. An attempt to execute it causes a Reserved Instruction exception.
C-42
Appendix D COP1 (FPU) Instruction Set Details
D. COP1 (FPU) Instruction Set Details
This appendix provides a detailed description of each of the COP1 coprocessor instructions. COP1 is implemented as a floating point unit (FPU). The instruction descriptions provide: * * * a bit by bit field definition of the instruction word signifying that instruction a verbal description of the operation performed by the instruction pseudo-code identifying the entire sphere of influence of the instruction in terms of operand dependency and the state (s) of the processor changed.
Omission of any/all states is taken to mean that the same have not changed by the act of execution of the instruction under description.
D-1
Appendix D COP1 (FPU) Instruction Set Details
D.1 Conventions Used in This Chapter
D.1.1 Instruction Description Notation and Functions
The Operation sections of the instruction descriptions use a high-level language notation, or pseudocode, to describe the instruction's operations. Symbols, functions, and structures used in the Operation sections are described here. The notation FPR as used here refers to the 32 floating-point registers FPR0 through FPR31 of the FPU.
D.1.2
Pseudocode Language Statement Execution
Each of the high-level language statements in an operation description is executed in sequential order (as modified by conditional and loop constructs).
D.1.3
Pseudocode Symbols
Special symbols used in the notation are described in Appendix A.
D.2 Definitions for Pseudocode Functions Used in Operation Descriptions
A variety of functions are used in the pseudocode descriptions to make the pseudocode more readable and also to abstract implementation-specific behavior. These functions are defined in Appendix A; in addition, certain COP1 FPU-specific functions are described in the following section. The following pseudocode notation is used in functions in the descriptions of floating-point operations:
Pseudocode Function StoreFPR (fpr, value) ConvertFmt (value, fmt1, fmt2) Negate (value) Sign-extend (Value)
Meaning FPR[fpr] value The value in the format fmt1 is converted to a value in the format fmt2. The value is negated by changing the sign bit value. A sign-extended 32-bit value has bits 63..31 of equal value
D-2
Appendix D COP1 (FPU) Instruction Set Details
D.3 Instruction Descriptions
Descriptions of FPU Instructions follow.
D-3
Appendix D COP1 (FPU) Instruction Set Details
ABS.fmt
31 26 25 21 20
Floating Point Absolute Value 16 15 11 10 65
ABS.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
ABS 000101
6
MIPS I
Format: ABS.S fd, fs ABS.D fd, fs Purpose: To compute the absolute value of an FP value.
Description: fd absolute (fs) The absolute value of the value in FPR fs is placed in FPR fd. The operand and result are values in format fmt. This operation is arithmetic; a NaN operand signals invalid operation. Restrictions: The field fs and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, AbsoluteValue (ValueFPR (fs, fmt)))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Unimplemented Operation Invalid Operation
D-4
Appendix D COP1 (FPU) Instruction Set Details
ADD.fmt
31 26 25 21 20
Floating Point Add 16 15 11 10 65
ADD.fmt
0
COP1 010001
6
fmt
5
ft
5
fs
5
fd
5
ADD 000000
6
MIPS I
Format: ADD.S fd, fs, ft ADD.D fd, fs, ft Purpose: To add FP values.
Description: fd fs + ft The value in FPR ft is added to the value in FPR fs. The result is calculated to infinite precision, rounded according to the current rounding mode in FCR31, and placed into FPR fd. The operands and result are values in format fmt. Restrictions: The field fs, ft and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, ValueFPR (fs, fmt) + ValueFPR (ft, fmt))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Unimplemented Operation Invalid Operation Inexact Overflow Underflow
D-5
Appendix D COP1 (FPU) Instruction Set Details
BC1F
31 26 25 21 20
Branch on FP False 16 15 0
BC1F
offset
16
COP1 010001
6
BC1 01000
5
BC1F 00000
5
MIPS I
Format: Purpose: BC1F offset To test an FP condition code and do a PC-relative conditional branch.
Description: if (C = 0) then branch where C is FCR3123 An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the result of the last floating point compare is false, branch to the effective target address after the instruction in the delay slot is executed. An FP condition code is set by the FP compare instruction, C.cond.fmt. Operation:
I: I+1: condition (FCR3123 = 0) target_offset (offset15)GPRLEN-(16+2) || offset || 02 if condition then PC PC + target endif
Exceptions: Coprocessor Unusable Reserved Instruction Programming Notes: With the 18-bit signed instruction offset, the conditional branch range is 128KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
D-6
Appendix D COP1 (FPU) Instruction Set Details
BC1T
31 26 25 21 20
Branch on FP True 16 15 0
BC1T
offset
16
COP1 010001
6
BC1 01000
5
BC1T 00001
5
MIPS I
Format: Purpose: BC1T offset To test an FP condition code and do a PC-relative conditional branch.
Description: if (C = 1) then branch where C is FCR3123. An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to not form a PC-relative effective target address. If the result of the last floating point compare is true, branch to the effective target address after the instruction in the delay slot is executed. An FP condition code is set by the FP compare instruction, C.cond.fmt. Operation:
I: I+1: condition (FCR3123 = 1) target (offset15)GPRLEN-(16+2) || offset || 02 if condition then PC PC + target endif
Exceptions: Coprocessor Unusable Reserved Instruction Programming Notes: With the 18-bit signed instruction offset, the conditional branch range is 128KB. Use jump (J) or jump register (JR) instructions to branch to more distant addresses.
D-7
Appendix D COP1 (FPU) Instruction Set Details
C.cond.fmt
31 26 25 21 20
Floating Point Compare 16 15 11 10
C.cond.fmt
65 43 0 cond 4
COP1 010001
6
fmt
5
ft
5
fs
5
0 00000
5
FC 11
2
MIPS I
Format: C.cond.S fs, ft C.cond.D fs, ft Purpose: To compare FP values and record the Boolean result in a condition code.
Description: C fs compare_cond ft The value in FPR fs is compared to the value in FPR ft; the values are in format fmt. The comparison is exact and neither overflows nor underflows. If the comparison specified by cond 2..1 is true for the operand values, then the result is true, otherwise it is false. If no exception is taken, the result is written into condition code C; true is 1 and false is 0. If cond3 is set and at least one of the values is a NaN, an Invalid Operation condition is raised; the result depends on the FP exception model currently active. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written and an Invalid Operation exception is taken immediately. Otherwise, the Boolean result is written into condition code C There are four mutually exclusive ordering relations for comparing floating-point values; one relation is always true and the others are false. The familiar relations are greater than, less than, and equal. In addition, the IEEE floating-point standard defines the relation unordered which is true when at least one operand value is NaN; NaN compares unordered with everything, including itself. Comparisons ignore the sign of zero, so +0 equals -0. The comparison condition is a logical predicate, or equation, of the ordering relations such as "less than or equal", "equal", "not less than", or "unordered or equal". Compare distinguishes sixteen comparison predicates. The Boolean result of the instruction is obtained by substituting the Boolean value of each ordering relation for the two FP values into equation. If the equal relation is true, for example, then all four example predicates above would yield a true result. If the unordered relation is true then only the final predicate, "unordered or equal" would yield a true result. Logical negation of a compare result allows eight distinct comparisons to test for sixteen predicates as shown in Table D-1. Each mnemonic tests for both a predicate and its logical negation. For each mnemonic, compare tests the truth of the first predicate. When the first predicate is true, the result is true as shown in the "if predicate is true" column (note that the False predicate is never true and False/True do not follow the normal pattern). When the first predicate is true, the second predicate must be false, and vice versa. The truth of the second predicate is the logical negation of the instruction result. After a compare instruction, test for the truth of the first predicate with the Branch on FP True (BC1T) instruction and the truth of the second with Branch on FP False (BC1F).
D-8
Appendix D COP1 (FPU) Instruction Set Details
Table D-1. FPU Comparisons Without Special Operand Exceptions Comparison Instr CC Result relation cond If Inv cond values field pred- Op name of predicate and logically negated Mneicate excp predicate (abbreviation) monic if Q > < = ? is 3 2..0 true NaN False [this predicate is always False, it F F F F F F 0 True (T) never has a True result] TTTT Unordered FFFT T UN 1 Ordered (OR) TTTF F Equal FFTF T EQ 2 Not Equal (NEQ) TTFT F Unordered or Equal FFTT T UEQ 3 Ordered or Greater than or Less than (OGL) T T F F F No 0 Ordered or Less Than FTFF T OLT 4 Unordered or Greater than or Equal (UGE) TFTT F Unordered or Less Than FTFT T ULT 5 Ordered or Greater than or Equal (OGE) TFTF F Ordered or Less than or Equal FTTF T OLE 6 Unordered or Greater Than (UGT) TFFT F Unordered or Less than or Equal FTTT T ULE 7 Ordered or Greater Than (OGT) TFFF F key: "?" = unordered, ">" = greater than, "<" = less than, "=" is equal, "T" = True, "F" = False Instr Comparison Predicate
D-9
Appendix D COP1 (FPU) Instruction Set Details There is another set of eight compare operations, distinguished by a cond3 value of 1, testing the same sixteen conditions. For these additional comparisons, if at least one of the operands is a NaN, including Quiet NaN, then an Invalid Operation condition is raised. If the Invalid Operation condition is enabled in the FCR31, then an Invalid Operation exception occurs.
Table D-2 FPU Comparisons With Special Operand Exceptions for QNaNs Comparison Instr CC Result relation cond If Inv cond values field pred- Op name of predicate and logically negated Mneicate excp predicate (abbreviation) monic if Q > < = ? is 3 2..0 true NaN Signaling False [this predicate F F F F SF F 0 always False] Signaling True (ST) TTTT Not Greater than or Less than or Equal FFFT T NGLE 1 Greater than or Less than or Equal (GLE) TTTF F Signaling Equal FFTF T SEQ 2 Signaling Not Equal (SNE) TTFT F Not Greater than or Less than FFTT T NGL 3 Greater than or Less than (GL) TTFF F Yes 1 Less Than FTFF T LT 4 Not Less Than (NLT) TFTT F Not Greater than or Equal FTFT T NGE 5 Greater than or Equal (GE) TFTF F Less than or Equal FTTF T LE 6 Not Less than or Equal (NLE) TFFT F Not Greater Than FTTT T NGT 7 Greater Than (GT) TFFF F key: "?" = unordered, ">" = greater than, "<" = less than, "=" is equal, "T" = True, "F" = False Instr Comparison Predicate
Restrictions: The field fs and ft must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
if NaN (Value FPR (fs, fmt)) or NaN (ValueFPR (ft, fmt)) then less false equal false unordered true if t then SignalException (InvalidOperation) endif else less ValueFPR (fs, fmt) < ValueFPR (ft, fmt) equal ValueFPR (fs, fmt) = ValueFPR (ft, fmt) unordered false endif condition (cond2 and less) or (cond1 and equal) or (cond0 and unordered) C condition
D-10
Appendix D COP1 (FPU) Instruction Set Details Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Unimplemented Operation Invalid Operation Programming Notes: FP computational instructions, including compare, that receive an operand value of Signaling NaN, will raise the Invalid Operation condition. The comparisons that raise the Invalid Operation condition for Quiet NaNs in addition to SNaNs, permit a simpler programming model if NaNs are errors. Using these compares, programs do not need explicit code to check for QNaNs causing the unordered relation. Instead, they take an exception and allow the exception handling system to deal with the error when it occurs. For example, consider a comparison in which we want to know if two numbers are equal, but for which unordered would be an error.
# comparisons using explicit tests for QNaN c.eq.d $f2,$f4 # check for equal nop bc1t L2 # it is equal c.un.d $f2,$f4 # it is not equal, but might be unordered bc1t ERROR# unordered goes off to an error handler # not-equal-case code here ... # equal-case code here L2: # -------------------------------------------------------------# comparison using comparisons that signal QNaN c.seq.d $f2,$f4 # check for equal nop bc1t L2 # it is equal nop # it is not unordered here... # not-equal-case code here ... #equal-case code here L2:
D-11
Appendix D COP1 (FPU) Instruction Set Details
CEIL.L.fmt
31 26 25
Floating-Point Ceiling Convert to Long Fixed-Point 21 20 16 15 11 10 65
CEIL.L.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
CEIL.L 001010
6
MIPS III
Format:
CEIL.L.S fd, fs CEIL.L.D fd, fs
Purpose:
To convert an FP value to 64-bit fixed-point, rounding up.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format rounding toward + (rounding mode 2). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 263 -1, is written to fd. Restrictions: The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed-point; see Floating-Point Registers on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt), fmt, L))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-12
Appendix D COP1 (FPU) Instruction Set Details
CEIL.W.fmt Floating-Point Ceiling Convert to Word Fixed-Point CEIL.W.fmt
31 26 25 21 20 16 15 11 10 65 0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
CEIL.W 001110
6
MIPS II
Format: CEIL.W.S fd, fs CEIL.W.D fd, fs Purpose: To convert an FP value to 32-bit fixed-point, rounding up.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format rounding toward + (rounding mode 2). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 231 -1, is written to fd. Restrictions: The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed-point; see Floating-Point Registers on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt), fmt, W))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-13
Appendix D COP1 (FPU) Instruction Set Details
CFC1
31 26 25
Move Control Word from Floating Point 21 20 16 15 11 10 0
CFC1
COP1 010001
6
CFC1 00010
5
rt
5
fs
5
0 000 0000 0000
11
MIPS I
Format: Purpose: CFC1 rt, fs To copy a word from an FPU control register to a GPR.
Description: rt FP_Control[fs] Copy the 32-bit word from FP (coprocessor 1) control register fs into GPR rt, signextending it if the GPR is 64 bits. Restrictions: There are only a couple control registers defined for the floating point unit. The result is not defined if fs specifies a register that does not exist. Operation:
GPR[rt] sign_extend (FCR[fs])
Exceptions: Coprocessor Unusable
D-14
Appendix D COP1 (FPU) Instruction Set Details
CTC1
31 26 25
Move Control Word to Floating Point 21 20 16 15 11 10 0
CTC1
COP1 010001
6
CTC1 00110
5
rt
5
fs
5
0 000 0000 0000
11
MIPS I
Format: Purpose: CTC1 rt, fs To copy a word from a GPR to an FPU control register.
Description: FP_Control[fs] rt Copy the low word from GPR rt into FP (coprocessor 1) control register fs. Writing to control register 31, the Floating-Point Control and Status Register or FCR31, causes the appropriate exception if any cause bit and its corresponding enable bit are both set. The register will be written before the exception occurs. Restrictions: There are only a couple control registers defined for the floating point unit. The result is not defined if fs specifies a register that does not exist. Operation:
temp FCR[fs] GPR[rt]31..0 temp
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow Underflow Division by Zero
D-15
Appendix D COP1 (FPU) Instruction Set Details
CVT.D.fmt
31 26 25
Floating-Point Convert to Double Foating Point 21 20 16 15 11 10 65
CVT.D.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
CVT.D 100001
6
MIPS I, III
Format: CVT.D.S fd, fs CVT.D.W fd, fs CVT.D.L fd, fs Purpose: To convert an FP or fixed-point value to double FP.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt is converted to a value in double floating-point format rounded according to the current rounding mode in FCR31. The result is placed in FPR fd. If fmt is S or W, then the operation is always exact. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for double floating point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, D, ConvertFmt (ValueFPR (fs, fmt), fmt, D))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Note:
Overflow and Underflow exceptions never occur because double precision data format can represent any value in other data types.
D-16
Appendix D COP1 (FPU) Instruction Set Details
CVT.L.fmt
31 26 25
Floating-Point Convert to Long Fixed-Point 21 20 16 15 11 10 65
CVT.L.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
CVT.L 100101
6
MIPS III
Format: CVT.L.S fd, fs CVT.L.D fd, fs Purpose: To convert an FP value to a 64-bit fixed-point.
Description: fd convert_and_round (fs) Convert the value in format fmt in FPR fs to long fixed-point format, round according to the current rounding mode in FCR31, and place the result in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 263 -1, is written to fd. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for long floating point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt), fmt, L))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-17
Appendix D COP1 (FPU) Instruction Set Details
CVT.S.fmt
31 26 25
Floating-Point Convert to Single Floating-Point 21 20 16 15 11 10 65
CVT.S.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
CVT.S 100000
6
MIPS I, III
Format: CVT.S.D fd, fs CVT.S.W fd, fs CVT.S.L fd, fs Purpose: To convert an FP or fixed-point value to single FP.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt is converted to a value in single floating-point format rounded according to the current rounding mode in FCR31. The result is placed in FPR fd. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for single floating point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, S, ConvertFmt (ValueFPR (fs, fmt), fmt, S))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow Underflow
D-18
Appendix D COP1 (FPU) Instruction Set Details
CVT.W.fmt
31 26 25
Floating-Point Convert to Word Fixed-Point 21 20 16 15 11 10 65
CVT.W.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
CVT.W 100100
6
MIPS I
Format: CVT.W.S fd, fs CVT.W.D fd, fs Purpose: To convert an FP value to a 32-bit fixed-point.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt is converted to a value in 32-bit word fixed-point format rounded according to the current rounding mode in FCR31. The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 231 -1, is written to fd. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt), fmt, W))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-19
Appendix D COP1 (FPU) Instruction Set Details
DIV.fmt
31 26 25 21 20
Floating Point Divide 16 15 11 10 65
DIV.fmt
0
COP1 010001
6
fmt
5
ft
5
fs
5
fd
5
DIV 000011
6
MIPS I
Format: DIV.S fd, fs, ft DIV.D fd, fs, ft Purpose: To divide FP values.
Description: fd fs / ft The value in FPR fs is divided by the value in FPR ft. The result is calculated to infinite precision, rounded according to the current rounding mode in FCR31, and placed into FPR fd. The operands and result are values in format fmt. Restrictions: The field fs, ft and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, ValueFPR (fs, fmt) / ValueFPR (ft, fmt))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Inexact Unimplemented Operation Division-by-zero Invalid Operation Overflow Underflow
D-20
Appendix D COP1 (FPU) Instruction Set Details
DMFC1
31 26 25
Doubleword Move From Floating-Point 21 20 16 15 11 10
DMFC1
0
COP1 010001
6
DMFC1 00001
5
rt
5
fs
5
0 000 0000 0000
11
MIPS III
Format: Purpose: DMFC1 rt, fs To copy a doubleword from an FPR to a GPR.
Description: rt fs The doubleword contents of FPR fs are placed into GPR rt. If the coprocessor 1 general registers are 32-bits wide (a native 32-bit processor or 32-bit register emulation mode in a 64-bit processor), FPR fs is held in an even/odd register pair. The low word is taken from the even register fs and the high word is from fs+1. Restrictions: If fs does not specify an FPR that can contain a doubleword, the result is undefined; see Floating Point Registers on page 10-2. Operation:
if SizeFGR() = 64 then data FGR[fs] elseif fs0 = 0 then data FGR[fs+1] || FGR[fs] else UndefinedResult() endif GPR[rt] data /* 64-bit wide FGRs */ /* valid specifier, 32-bit wide FGRs */ /* undefined for odd 32-bit FGRs */
Exceptions: Reserved Instruction Coprocessor Unusable
D-21
Appendix D COP1 (FPU) Instruction Set Details
DMTC1
31 26 25
Doubleword Move To Floating-Point 21 20 16 15 11 10
DMTC1
0
COP1 010001
6
DMTC1 00101
5
rt
5
fs
5
0 000 0000 0000
11
MIPS III
Format: Purpose: DMTC1 rt, fs To copy a doubleword from a GPR to an FPR.
Description: fs rt The doubleword contents of GPR rt are placed into FPR fs. If the coprocessor 1 general registers are 32-bits wide (a native 32-bit processor or 32-bit register emulation mode in a 64-bit processor), FPR fs is held in an even/odd register pair. The low word is Placed in the even register fs and the high word is placed in fs+1. Restrictions: If fs does not specify an FPR that can contain a doubleword, the result is undefined; see Floating Point Registers on page 10-2. Operation:
data GPR[rt] if SizeFGR() = 64 then /* 64-bit wide FGRs */ FGR[fs] data elseif fs0 = 0 then /* valid specifier, 32-bit wide FGRs */ FGR[fs+1] data63..32 FGR[fs] data31..0 else /* undefined result for odd 32-bit FGRs */ UndefinedResult() endif
Exceptions: Reserved Instruction Coprocessor Unusable
D-22
Appendix D COP1 (FPU) Instruction Set Details
FLOOR.L.fmt
31 26 25
Floating-Point Floor Convert to Long Fixed-Point 21 20 16 15 11 10 65
FLOOR.L.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
FLOOR.L 001011
6
MIPS III
Format: FLOOR.L.S fd, fs FLOOR.L.D fd, fs Purpose: To convert an FP value to a 64-bit fixed-point, rounding down.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format rounding toward - (rounding mode 3). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 263 -1, is written to fd. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt), fmt, L))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-23
Appendix D COP1 (FPU) Instruction Set Details
FLOOR.W.fmt
31 26 25
Floating-Point Floor Convert to Word Fixed-Point 21 20 16 15 11 10 65
FLOOR.W.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
FLOOR.W 001111
6
MIPS II
Format: FLOOR.W.S fd, fs FLOOR.W.D fd, fs Purpose: To convert an FP value to a 32-bit fixed-point, rounding down.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format rounding toward - (rounding mode 3). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 231 -1, is written to fd. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt), fmt, W))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-24
Appendix D COP1 (FPU) Instruction Set Details
LDC1
31 26 25 21 20
Load Doubleword to Floating-Point 16 15 0
LDC1
LDC1 110101
6
base
5
ft
5
offset
16
MIPS II
Format: Purpose: LDC1 ft, offset (base) To load a doubleword from memory to an FPR.
Description: ft memory[base+offset] The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched and placed in FPR ft. The 16-bit signed offset is added to the contents of GPR base to form the effective address. If coprocessor 1 general registers are 32-bits wide (a native 32-bit processor or 32-bit register emulation mode in a 64-bit processor), FPR ft is held in an even/odd register pair. The low word is placed in the even register ft and the high word is placed in ft+1. Restrictions: If ft does not specify an FPR that can contain a doubleword, the result is undefined; see Floating-Point Resisters on page 10-2. An Address Error exception occurs if EffectiveAddress2..0 0 (not doubleword-aligned). Operation:
vAddr sign_extend (offset) + GPR[base] if vAddr2..0 03 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) data LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) if SizeFGR() = 64 then /* 64-bit wide FGRs */ FGR[ft] data elseif ft0 = 0 then /* valid specifier, 32-bit wide FGRs */ FGR[ft+1] data63..32 FGR[ft] data31..0 else /* undefined result for odd 32-bit FGRs */ UndefinedResult() endif
Exceptions: Coprocessor Unusable TLB Refill TLB Invalid Address Error
D-25
Appendix D COP1 (FPU) Instruction Set Details
LWC1
31 26 25 21 20
Load Word to Floating Point 16 15 0
LWC1
LWC1 110001
6
base
5
ft
5
offset
16
MIPS I
Format: Purpose: LWC1 ft, offset (base) To load a word from memory to an FPR.
Description: ft memory[base+offset] The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and placed into the low word of coprocessor 1 general register ft . The 16-bit signed offset is added to the contents of GPR base to form the effective address. If coprocessor 1 general registers are 64-bits wide, bits 63..32 of register ft become undefined. See Floating Point Register on page 10-2. Restrictions: An Address Error exception occurs if EffectiveAddress1..0 0 (not word-aligned). Operation: 32-bit Processors
I: /* "mem" is aligned 64-bits from memory. Pick out correct bytes. */
vAddr sign_extend (offset) + GPR[base] if vAddr1..0 02 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) mem LoadMemory (uncached, WORD, pAddr, vAddr, DATA) I + 1: FGR[ft] mem
Operation: 64-bit Processors
/* "mem" is aligned 64-bits from memory. Pick out correct bytes. */ vAddr sign_extend (offset) + GPR[base] if vAddr1..0 02 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, LOAD) pAddr pAddr PSIZE-1..3 || (pAddr2..0 xor (ReverseEndian || 02 )) mem LoadMemory (uncached, WORD, pAddr, vAddr, DATA) bytesel vAddr2..0 xor (BigEndianCPU || 02 ) if SizeFGR() = 64 then /* 64-bit wide FGRs */ FGR[ft] undefined 32 || mem31+8*bytesel..8*bytesel else /* 32-bit wide FGRs */ FGR[ft] mem31+8*bytesel..8*bytesel endif
Exceptions: Coprocessor unusable TLB Refill TLB Invalid Address Error
D-26
Appendix D COP1 (FPU) Instruction Set Details
MFC1
31 26 25 21 20
Move Word from Floating Point 16 15 11 10 0
MFC1
COP1 010001
6
MFC1 00000
5
rt
5
fs
5
0 000 0000 0000
11
MIPS I
Format: Purpose: MFC1 rt, fs To copy a word from an FPU (COP1) general register to a GPR.
Description: rt fs The low word from FPR fs is placed into the low word of GPR rt. If GPR rt is 64 bits wide, then the value is sign extended. See Floating Point Resisters on page 10-2. Restrictions: None Operation:
GPR[rt] sign_extend (FPR[fs]31..0)
Exceptions: Coprocessor Unusable
D-27
Appendix D COP1 (FPU) Instruction Set Details
MOV.fmt
31 26 25 21 20
Floating Point Move 16 15 11 10 65
MOV.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
MOV 000110
6
MIPS I
Format: MOV.S fd, fs MOV.D fd, fs Purpose: To move an FP value between FPRs.
Description: fd fs The value in FPR fs is placed into FPR fd . The source and destination are values in format fmt. The move is non-arithmetic; it causes no IEEE 754 exceptions. Restrictions: The field fs and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, ValueFPR (fs, fmt))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Unimplemented Operation
D-28
Appendix D COP1 (FPU) Instruction Set Details
MTC1
31 26 25 21 20
Move Word to Floating Point 16 15 11 10 0
MTC1
COP1 010001
6
MTC1 00100
5
rt
5
fs
5
0 000 0000 0000
11
MIPS I
Format: Purpose: MTC1 rt, fs To copy a word from a GPR to an FPU (COP1) general register.
Description: fs rt The low word in GPR rt is placed into the low word of floating-point (coprocessor 1) general register fs. If coprocessor 1 general registers are 64-bits wide, bits 63..32 of register fs become undefined. See Floating-Point Registers on page 10-2. Operation:
data GPR[rt]31..0 if SizeFGR() = 64 then /* 64-bit wide FGRs */ FGR[fs] undefined32 || data else /* 32-bit wide FGRs */ FGR[fs] data endif
Exceptions: Coprocessor Unusable
D-29
Appendix D COP1 (FPU) Instruction Set Details
MUL.fmt
31 26 25 21 20
Floating Point Multiply 16 15 11 10 65
MUL.fmt
0
COP1 010001
6
fmt
5
ft
5
fs
5
fd
5
MUL 000010
6
MIPS I
Format: MUL.S fd, fs, ft MUL.D fd, fs, ft Purpose: To multiply FP values.
Description: fd fs x ft The value in FPR fs is multiplied by the value in FPR ft. The result is calculated to infinite precision, rounded according to the current rounding mode in FCR31, and placed into FPR fd. The operands and result are value in format fmt. Restrictions: The field fs, ft and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, ValueFPR (fs, fmt) * ValueFPR (ft, fmt))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Inexact Unimplemented Operation Invalid Operation Overflow Underflow
D-30
Appendix D COP1 (FPU) Instruction Set Details
NEG.fmt
31 26 25 21 20
Floating Point Negate 16 15 11 10 65
NEG.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
NEG 000111
6
MIPS I
Format: NEG.S fd, fs NEG.D fd, fs Purpose: To negate a floating-point value.
Description: fd -(fs) The value in FPR fs is negated and placed into FPR fd. The value is negated by changing the sign bit value. The operand and result are values in format fmt. This operation is arthmetic; a NaN operand signals invalid operation. Restrictions: The field fs and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, Negate (ValueFPR (fs, fmt))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Unimplemented Operation Invalid Operation
D-31
Appendix D COP1 (FPU) Instruction Set Details
ROUND.L.fmt
31 26 25
Floating Point Round to Long FixedPoint 21 20 16 15 11 10 65
ROUND.L.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
ROUND.L 001000
6
MIPS III
Format: ROUND.L.S fd, fs ROUND.L.D fd, fs Purpose: To convert an FP value to 64-bit fixed-point, round to nearest.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format rounding to nearest/even (rounding mode 0). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 263 -1, is written to fd. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt), fmt,L)
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Inexact Unimplemented Operation Overflow Invalid Operation
D-32
Appendix D COP1 (FPU) Instruction Set Details
ROUND.W.fmt
31 26 25
Floating Point Round to Word FixedPoint 21 20 16 15 11 10 65
ROUND.W.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
ROUND.W 001100
6
MIPS II
Format: ROUND.W.S fd, fs ROUND.W.D fd, fs Purpose: To convert an FP value to 32-bit fixed-point, round to nearest.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format rounding to nearest/even (rounding mode 0). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 231 -1, is written to fd. Restrictions: The field fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed point; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt), fmt,W)
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Inexact Unimplemented Operation Overflow Invalid Operation
D-33
Appendix D COP1 (FPU) Instruction Set Details
SDC1
31 26 25 21 20
Store Doubleword to Floating-Point 16 15 0
SDC1
SDC1 111101
6
base
5
ft
5
offset
16
MIPS II
Format: Purpose: SDC1 ft, offset (base) To store a doubleword from an FPR to memory.
Description: memory[base+offset] ft The 64-bit doubleword in FPR ft is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. If coprocessor 1 general registers are 32-bits wide (a native 32-bit processor or 32-bit register emulation mode in a 64-bit processor), FPR ft is held in an even/odd register pair. The low word is taken from the even register ft and the high word is from ft+1. Restrictions: If ft does not specify an FPR that can contain a doubleword, the result is undefined; see Floating-Point Resisters on page 10-2. An Address Error exception occurs if EffectiveAddress2..0 0 (not doubleword-aligned). Operation:
vAddr sign_extend (offset) + GPR[base] if vAddr2..0 03 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) if SizeFGR() = 64 then /* 64-bit wide FGRs */ data FGR[ft] elseif ft0 = 0 then /* valid specifier, 32-bit wide FGRs */ data FGR[ft+1] || FGR[ft] else /* undefined for odd 32-bit FGRs */ UndefinedResult() endif StoreMemory(uncached, DOUBLEWORD, data, pAddr, vAddr, DATA)
Exceptions: Coprocessor Unusable TLB Refill TLB Invalid TLB Modified Address Error
D-34
Appendix D COP1 (FPU) Instruction Set Details
SQRT.fmt
31 26 25 21 20
Floating Point Square Root 16 15 11 10 65
SQRT.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
SQRT 000100
6
MIPS II
Format: SQRT.S fd, fs SQRT.D fd, fs Purpose: To compute the square root of an FP value.
Description: fd SQRT (fs) The square root of the value in FPR fs is calculated to infinite precision, rounded according to the current rounding mode in FCR31, and placed into FPR fd. The operand and result are values in format fmt. If the value in FPR fs corresponds to -0, the result will be -0. Restrictions: If the value in FPR fs is less than 0, an Invalid Operation condition is raised. The field fs and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, SquareRoot (FPR (fs, fmt)))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Inexact Unimplemented Operation Invalid Operation
D-35
Appendix D COP1 (FPU) Instruction Set Details
SUB.fmt
31 26 25 21 20
Floating Point Subtract 16 15 11 10 65
SUB.fmt
0
COP1 010001
6
fmt
5
ft
5
fs
5
fd
5
SUB 000001
6
MIPS I
Format: SUB.S fd, fs, ft SUB.S fd, fs, ft Purpose: To subtract FP values.
Description: fd fs - ft The value in FPR ft is subtracted from the value in FPR fs. The result is calculated to infinite precision, rounded according to the current rounding mode in FCR31, and placed into FPR fd . The operands and result are value in format fmt. Restrictions: The field fs, ft, and fd must specify FPRs valid for operands of type fmt; see Floating-Point Resisters on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, fmt, ValueFPR (fs, fmt) - ValueFPR (ft, fmt))
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Inexact Unimplemented Operation Invalid Operation Overflow Underflow
D-36
Appendix D COP1 (FPU) Instruction Set Details
SWC1
31 26 25 21 20
Store Word from Floating Point 16 15
SWC1
0
SWC1 111001
6
base
5
ft
5
offset
16
MIPS I
Format: Purpose: SWC1 ft, offset (base) To store a word from an FPR to memory.
Description: memory[base+offset] ft The low 32-bit word from FPR ft is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. Restrictions: An Address Error exception occurs if EffectiveAddress1..0 0 (not word-aligned). Operation: 32-bit Processors
vAddr sign_extend (offset) + GPR[base] if vAddr1..0 02 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) data FGR[ft] StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)
Operation: 64-bit Processors
vAddr sign_extend (offset) + GPR[base] if vAddr1..0 02 then SignalException (AddressError) endif (pAddr, uncached) AddressTranslation (vAddr, DATA, STORE) pAddr pAddr PSIZE-1..3 || (pAddr2..0 xor (ReverseEndian || 02 )) bytesel vAddr2..0 xor (BigEndianCPU || 02 ) /* the bytes of the word are moved into the correct byte lanes */ if SizeFGR() = 64 then /* 64-bit wide FGRs */ data 032-8*bytesel || FGR[ft]31..0 || 08*bytesel /* top or bottom wd of 64-bit data */ else /* 32-bit wide FGRs */ data 032-8*bytesel || FGR[ft] || 08*bytesel /* top or bottom wd of 64-bit data */ endif StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)
Exceptions: Coprocessor Unusable TLB Refill TLB Invalid TLB Modified Address Error
D-37
Appendix D COP1 (FPU) Instruction Set Details
TRUNC.L.fmt
31 26 25
Floating Point Truncate to Long FixedPoint 21 20 16 15 11 10 65
TRUNC.L.fmt
0
COP1 010001
6
fmt
5
0 00000
5
fs
5
fd
5
TRUNC.L 001001
6
MIPS III
Format: TRUNC.L.S fd, fs TRUNC.L.D fd, fs Purpose: To convert an FP value to 64-bit fixed-point, rounding toward zero.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 64-bit long fixed-point format rounding toward zero (rounding mode 1). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -263 to 263 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 263 -1, is written to fd. Restrictions: The fields fs and fd must specify valid FPRs; fs for type fmt and fd for long fixed-point; see Floating-Point Registers on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt), fmt, L)
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-38
Appendix D COP1 (FPU) Instruction Set Details
TRUNC.W.fmt Floating Point Truncate to Word Fixed- TRUNC.W.fmt Point
31 26 25 21 20 16 15 11 10 65 0
COP1 010001
6
ft
5
0 00000
5
fs
5
fd
5
TRUNC.W 001101
6
MIPS II
Format: TRUNC.W.S fd, fs TRUNC.W.D fd, fs Purpose: To convert an FP value to 32-bit fixed-point, rounding toward zero.
Description: fd convert_and_round (fs) The value in FPR fs in format fmt, is converted to a value in 32-bit word fixed-point format rounding toward zero (rounding mode 1). The result is placed in FPR fd. When the source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231 -1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCR31. If the Invalid Operation enable bit is set in the FCR31, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, the default result, 231 -1, is written to fd. Restrictions: The fields fs and fd must specify valid FPRs; fs for type fmt and fd for word fixed-point; see Floating-Point Registers on page 10-2. If they are not valid, the result is undefined. Operation:
StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt), fmt, W)
Exceptions: Coprocessor Unusable Reserved Instruction Floating-Point Invalid Operation Unimplemented Operation Inexact Overflow
D-39
Appendix D COP1 (FPU) Instruction Set Details
D.4 COP1 Instruction Encoding
31 26 0
OpCode
OpCode bits 28..26
Instructions encoded by OpCode field (COP1, LWC1, SWC1, LDC1, SDC1)
bits 31..29 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111
0 000 SPECIAL ADDI COP0 DADDI LB SB
1 001 REGIMM ADDIU COP1 DADDIU LH SH LWC1 SWC1
2 010 J SLTI * LDL LWL SWL
3 011 JAL SLTIU * LDR LW SW PREF *
4 100 BEQ ANDI BEQL MMI LBU SDL
5 101 BNE ORI BNEL * LHU SDR LDC1 SDC1
6 110 BLEZ XORI BLEZL LQ LWR SWR
7 111 BGTZ LUI BGTZL SQ LWU CACHE LD SD
31 26 25 OpCode = rs COP1
21
0
rs
bits 25..24 0 00 1 01 2 10 3 11
bits 23..21
Instructions encoded by rs field when OpCode field = COP1
0 000 MFC1 BC1 S
1 001 DMFC1 * D
2 010 CFC1 *
3 011 * *
4 100 MTC1 * W
5 101 DMTC1 * L
6 110 CTC1 *
7 111 * *
31 26 25 21 20 OpCode = rs = BC1 rt COP1 bits 18..16
16
0
rt
bits 20..19 0 00 1 01 2 10 3 11
Instructions encoded by rt field when OpCode field = COP1 & rs field = BC1 1 001 BC1T * * * 2 010 * * * * 3 011 * * * * 4 100 * * * * 5 101 * * * * 6 110 * * * * 7 111 * * * *
0 000 BC1F * * *
D-40
Appendix D COP1 (FPU) Instruction Set Details
31 26 25 21 OpCode = rs = S, D COP1 bits 2..0 Instructions encoded by function field when OpCode field = COP1 & rs field = S, D 1 001 SUB 2 010 MUL CEIL.L C.EQ C.SEQ 3 011 DIV 4 100 SQRT 5 101 ABS 6 110 MOV CEIL.W C.OLE C.LE 7 111 NEG FLOOR.W C.ULE C.NGT 5 0
function
function
bits 5.3 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111
0 000 ADD
ROUND.L TRUNC.L CVT.S C.F C.SF CVT.D C.UN C.NGLE
FLOOR.L ROUND.W TRUNC.W C.UEQ C.NGL CVT.W C.OLT C.LT CVT.L C.ULT C.NGE
31 26 25 21 OpCode = rs = W, L COP1 bits 2..0 Instructions encoded by function field when OpCode field = COP1 & rs field = W, L 1 001 CVT.D 2 010 3 011 4 100 5 101 6 110
5
0
function
function
bits 5.3 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111
0 000 CVT.S
7 111
*
This OpCode is reserved for future use. An attempt to execute it causes a Reserved Instruction exception but this is not guaranteed. This OpCode is reserved for future use. An attempt to execute it produces an undefined result. The result may be an Unimplemented Operation exception. This OpCode indicates an instruction class. The instruction word must be further decoded by examining additional tables that show the values for another instruction field. This OpCode is reserved for one of the following instructions which are currently not supported: DMULT, DMULTU, DDIV, DDIVU, LL, LLD, SC, SCD, LWC2, SWC2. An attempt to execute it causes a Reserved Instruction exception.
D-41
Appendix D COP1 (FPU) Instruction Set Details
D-42


▲Up To Search▲   

 
Price & Availability of TX79COREARCHITECTURE

All Rights Reserved © IC-ON-LINE 2003 - 2022  

[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy]
Mirror Sites :  [www.datasheet.hk]   [www.maxim4u.com]  [www.ic-on-line.cn] [www.ic-on-line.com] [www.ic-on-line.net] [www.alldatasheet.com.cn] [www.gdcy.com]  [www.gdcy.net]


 . . . . .
  We use cookies to deliver the best possible web experience and assist with our advertising efforts. By continuing to use this site, you consent to the use of cookies. For more information on cookies, please take a look at our Privacy Policy. X